I found myself into a CacheItem that didn't clean up correctly. While looking at MSDN and correct myself into using Utc-based calculation, I found this confusing information:
CacheItemPolicy.Priority
CacheItemPolicy.AbsoluteExpiration
AbsolutExpiration is used to set a "keep-alive" of a CacheItem, Priority.NotRemovable is used to force CacheItem to exist forever. No notification about what property overrides the other.
The code below do compile and SQL Profiler also confirm that the database is queried only once, while every other request came from cache.
CacheItemPolicy _cachePolicy = new CacheItemPolicy()
{
AbsoluteExpiration = new DateTimeOffset(DateTime.Now.AddHours(6)),
Priority = CacheItemPriority.NotRemovable
};
I assume that this code force the cache items to stay forever but are cleared after 12 hours from creation, in line with the MSDN's note about the setting.
"Cache implementations should set the NotRemovable priority for a
cache entry only if the cache implementation provides ways to evict
entries from the cache and to manage the number of cache entries"
Then the other side, why would both properties work together at all? Does the implementation bring some kind of "more non-removable"?
So according to this "NotRemovable" prevents the cache entry from being removed automatically (like when the cache is running out of space) but will be removed when it expires or you manually take it out of the cache.
NotRemovable The cache items with this priority level will not be automatically deleted from the cache as the server frees system memory. However, items with this priority level are removed along with other items according to the item's absolute or sliding expiration time.
Related
I have this static class
static class LocationMemoryCache
{
public static readonly ConcurrentDictionary<int, LocationCityContract> LocationCities = new();
}
My process
Api starts and initializes an empty dictionary
A background job starts and runs once every day to reload the dictionary from the database
Requests come in to read from the dictionary or update a specific city in the dictionary
My problem
If a request comes in to update the city
I update the database
If the update was successful, update the city object in the dictionary
At the same time, the background job started and queried all cities before I updated the specific city
The request finishes and the dictionary city now has the old values because the background job finished last
My solution I thought about first
Is there a way to lock/reserve the concurrent dictionary from reads/writes and then release it when I am done?
This way when the background job starts, it can lock/reserve the dictionary only for itself and when it's done it will release it for other requests to be used.
Then a request might have been waiting for the dictionary to be released and update it with the latest values.
Any ideas on other possible solutions?
Edit
What is the purpose of the background job?
If I manually update/delete something in the database I want those changes to show up after the background job runs again. This could take a day for the changes to show up and I am okay with that.
What happens when the Api wants to access the cache but its not loaded?
When the Api starts I block requests to this particular "Location" project until the background job marks IsReady to true. The cache I implemented is thread safe until I add the background job.
How much time does it take to reload the cache?
I would say less then 10 seconds for a total of 310,000+ records in the "Location" project.
Why I chose the answer
I chose Xerillio's answer because it solves the background job problem by keeping track of date times. Similar to a "object version" approach. I won't be taking this path as I have decided that if I do a manual update in the database, I might as well create an API route that does it for me so that I can update the db and cache at the same time. So I might remove the background job after all or just run it once a week. Thank you for all the answers and I am ok with a possible data inconsistency with the way I am updating the objects because if one route updates 2 specific values and another route updates 2 different specific values then the possibility of having a problem is very minimal
Edit 2
Let's imagine I have this cache now and 10,000 active users
static class LocationMemoryCache
{
public static readonly ConcurrentDictionary<int, LocationCityUserLogContract> LocationCityUserLogs = new();
}
Things I took into consideration
An update will only happen to objects that the user owns and the rate at which the user might update those objects is most likely once every minute. So that reduces the possibility of a problem by a lot for this specific example.
Most of my cache objects are related only to a specific user so it relates with bullet point 1.
The application owns the data, I don't. So I should never manually update the database unless it's critical.
Memory might be a problem but 1,000,000 normalish objects is somewhere between 80MB - 150MB. I can have a lot of objects in memory to gain performance and reduce the load on the database.
Having a lot of objects in memory will put pressure on Garbage Collection and that is not good but I don't think its bad at all for me because Garbage Collection only runs when memory gets low and all I have to do is just plan ahead to make sure there is enough memory. Yes it will run because of day to day operations but it won't be a big impact.
All of these considerations just so that I can have an in memory cache right at my finger tips.
I would suggest adding a UpdatedAt/CreatedAt property to your LocationCityContract or creating a wrapper object (CacheItem<LocationCityContract>) with such a property. That way you can check if the item you're about to add/update with is newer than the existing object like so:
public class CacheItem<T>
{
public T Item { get; }
public DateTime CreatedAt { get; }
// In case of system clock synchronization, consider making CreatedAt
// a long and using Environment.TickCount64. See comment from #Theodor
public CacheItem(T item, DateTime? createdAt = null)
{
Item = item;
CreatedAt = createdAt ?? DateTime.UtcNow;
}
}
// Use it like...
static class LocationMemoryCache
{
public static readonly
ConcurrentDictionary<int, CacheItem<LocationCityContract>> LocationCities = new();
}
// From some request...
var newItem = new CacheItem(newLocation);
// or the background job...
var newItem = new CacheItem(newLocation, updateStart);
LocationMemoryCache.LocationCities
.AddOrUpdate(
newLocation.Id,
newItem,
(_, existingItem) =>
newItem.CreatedAt > existingItem.CreatedAt
? newItem
: existingItem)
);
When a request wants to update the cache entry they do as above with the timestamp of whenever they finished adding the item to the database (see notes below).
The background job should, as soon as it starts, save a timestamp (let's call it updateStart). It then reads everything from the database and adds the items to the cache like above, where CreatedAt for the newLocation is set to updateStart. This way, the background job only updates the cache items that haven't been updated since it started. Perhaps you're not reading all items from DB as the first thing in the background job, but instead you read them one at a time and update the cache accordingly. In that case updateStart should instead be set right before reading each value (we could call it itemReadStart instead).
Since the way of updating the item in the cache is a little more cumbersome and you might be doing it from a lot of places, you could make a helper method to make the call to LocationCities.AddOrUpdate a little easier.
Note:
Since this approach is not synchronizing (locking) updates to the database, there's a race condition that means you might end up with a slightly out-of-date item in the cache. This can happen if two requests wants to update the same item simultaneously. You can't know for sure which one updated the DB last, so even if you set CreatedAt to the timestamp after updating each, it might not truly reflect which one was updated last. Since you're ok with a 24 hour delay from manually updating the DB until the background job updates the cache, perhaps this race condition is not a problem for you as the background job will fix it when run.
As #Theodor mentioned in the comments, you should avoid updating the object from the cache directly. Either use the C# 9 record type (as opposed to a class type) or clone the object if you want to cache new updates. That means, don't use LocationMemoryCache[locationId].Item.CityName = updatedName. Instead you should e.g. clone it like:
// You need to implement a constructor or similar to clone the object
// depending on how complex it is
var newLoc = new LocationCityContract(LocationMemoryCache[locationId].Item);
newLoc.CityName = updatedName;
var newItem = new CacheItem(newLoc);
LocationMemoryCache.LocationCities
.AddOrUpdate(...); /* <- like above */
By not locking the whole dictionary you avoid having requests being blocked by each other because they're trying to update the cache at the same time. If the first point is not acceptable you can also introduce locking based on the location ID (or whatever you call it) when updating the database, so that DB and cache are updated atomically. This avoids blocking requests that are trying to update other locations so you minimize the risk of requests affecting each other.
No, there is no way to lock a ConcurrentDictionary on demand from reads/writes, and then release it when you are done. This class does not offer this functionality. You could manually use a lock every time you are accessing the ConcurrentDictionary, but by doing so you would lose all the advantages that this specialized class has to offer (low contention under heavy usage), while keeping all its disadvantages (awkward API, overhead, allocations).
My suggestion is to use a normal Dictionary protected with a lock. This is a pessimistic approach that will result occasionally to some threads unnecessarily blocked, but it is also very simple and easy to reason about its correctness. Essentially all access to the dictionary and the database will be serialized:
Every time a thread wants to read an object stored in the dictionary, will first have to take the lock, and keep the lock until it's done reading the object.
Every time a thread wants to update the database and then the corresponding object, will first have to take the lock (before even updating the database), and keep the lock until all the properties of the object have been updated.
Every time the background job wants to replace the current dictionary with a new dictionary, will first have to take the lock (before even querying the database), and keep the lock until the new dictionary has taken the place of the old one.
In case the performance of this simple approach proves to be unacceptable, you should look at more sophisticated solutions. But the complexity gap between this solution and the next simplest solution (that also offers guaranteed correctness) is likely to be quite significant, so you'd better have good reasons before going that route.
I am coding a MVC 5 internet application, and am using the MemoryCache object for caching objects. I see that using the MemoryCache.Set method, an absoluteExpiration can be specified.
If I use the following way to add and retrieve an object from the MemoryCache, what is the absoluteExpiration set to:
cache['cacheItem'] = testObject;
TestObject testObject = cache['cacheItem'] as TestObject;
Also, when using the MemoryCache in an MVC internet application, should I set the amount of memory that can be used for the MemoryCache, or is the default implementation safe enough for an Azure website?
Thanks in advance.
Your code is equivalent to calling Add, like below:
cache.Add("cacheItem", testObject, null);
The added entry would have the default expiration time, which is infinite (i.e., it doesn't expire). See the MSDN on CacheItemPolicy.AbsoluteExpiration for details.
To answer the question about memory usage: (from CacheMemoryLimitMegabytes Property):
The default is zero, which indicates that MemoryCache instances manage their own memory based on the amount of memory that is installed on the computer.
I would say that it's safe to let the MemoryCache defaults decide how much memory to use, unless you're doing something really fancy.
From http://msdn.microsoft.com/en-us/library/system.runtime.caching.cacheitempolicy.slidingexpiration(v=vs.110).aspx ...
"A span of time within which a cache entry must be accessed before the cache entry is evicted from the cache. The default is NoSlidingExpiration, meaning that the item should not be expired based on a time span."
What exactly is 'accessed' ? Does that mean if I hit the cached item like:
var object = cache["cachekeyname"];
It's considered to be 'accessed' ?
Or would it only be considered accessed if I actually modify the cached item?
it does mean that the cache is accessed if the following code is called:
var object = cache["cachekeyname"];
Therefore if the piece of code or functionality containing the above code snippet is not called within X time since you put the object into the cache or it was last accessed it will be removed from the cache.
I have some places where implementing some sort of cache might be useful. For example in cases of doing resource lookups based on custom strings, finding names of properties using reflection, or to have only one PropertyChangedEventArgs per property name.
A simple example of the last one:
public static class Cache
{
private static Dictionary<string, PropertyChangedEventArgs> cache;
static Cache()
{
cache = new Dictionary<string, PropertyChangedEventArgs>();
}
public static PropertyChangedEventArgs GetPropertyChangedEventArgs(
string propertyName)
{
if (cache.ContainsKey(propertyName))
return cache[propertyName];
return cache[propertyName] = new PropertyChangedEventArgs(propertyName);
}
}
But, will this work well? For example if we had a whole load of different propertyNames, that would mean we would end up with a huge cache sitting there never being garbage collected or anything. I'm imagining if what is cached are larger values and if the application is a long-running one, this might end up as kind of a problem... or what do you think? How should a good cache be implemented? Is this one good enough for most purposes? Any examples of some nice cache implementations that are not too hard to understand or way too complex to implement?
This is a large problem, you need to determine the domain of the problem and apply the correct techniques. For instance, how would you describe the expiration of the objects? Do they become stale over a fixed interval of time? Do they become stale from an external event? How frequently does this happen? Additionally, how many objects do you have? Finally, how much does it cost to generate the object?
The simplest strategy would be to do straight memoization, as you have above. This assumes that objects never expire, and that there are not so many as to run your memory dry and that you think the cost to create these objects warrants the use of a cache to begin with.
The next layer might be to limit the number of objects, and use an implicit expiration policy, such as LRU (least recently used). To do this you'd typically use a doubly linked list in addition to your dictionary, and every time an objects is accessed it is moved to the front of the list. Then, if you need to add a new object, but it is over your limit of total objects, you'd remove from the back of the list.
Next, you might need to enforce explicit expiration, either based on time, or some external stimulus. This would require you to have some sort of expiration event that could be called.
As you can see there is alot of design in caching, so you need to understand your domain and engineer appropriately. You did not provide enough detail for me to discuss specifics, I felt.
P.S. Please consider using Generics when defining your class so that many types of objects can be stored, thus allowing your caching code to be reused.
You could wrap each of your cached items in a WeakReference. This would allow the GC to reclaim items if-and-when required, however it doesn't give you any granular control of when items will disappear from the cache, or allow you to implement explicit expiration policies etc.
(Ha! I just noticed that the example given on the MSDN page is a simple caching class.)
Looks like .NET 4.0 now supports System.Runtime.Caching for caching many types of things. You should look into that first, instead of re-inventing the wheel. More details:
http://msdn.microsoft.com/en-us/library/system.runtime.caching%28VS.100%29.aspx
This is a nice debate to have, but depending your application, here's some tips:
You should define the max size of the cache, what to do with old items if your cache is full, have a scavenging strategy, determine a time to live of the object in the cache, does your cache can/must be persisted somewhere else that memory, in case of application abnormal termination, ...
This is a common problem that has many solutions depending on your application need.
It is so common that Microsoft released a whole library to address it.
You should check out Microsoft Velocity before rolling up your own cache.
http://msdn.microsoft.com/en-us/data/cc655792.aspx
Hope this help.
You could use a WeakReference but if your object is not that large than don't because the WeakReference would be taking more memory than the object itself which is not a good technique. Also, if the object is a short-time usage where it will never make it to generation 1 from generation 0 on the GC, there is not much need for the WeakReference but IDisposable interface on the object would have with the release on SuppressFinalize.
If you want to control the lifetime you need a timer to update the datetime/ timespan again the desiredExpirationTime on the object in your cache.
The important thing is if the object is large then opt for the WeakReference else use the strong reference. Also, you can set the capacity on the Dictionary and create a queue for requesting additional objects in your temp bin serializing the object and loading it when there is room in the Dictionary, then clear it from the temp directory.
I am writing a Console Application in C# in which I want to cache certain items for a predefined time (let's say 1 hour). I want items that have been added into this cache to be automatically removed after they expire. Is there a built-in data structure that I can use? Remember this is a Console App not a web app.
Do you actually need them removed from the cache at that time? Or just that future requests to the cache for that item should return null after a given time?
To do the former, you would need some sort of background thread that was periodically purging the cache. This would only be needed if you were worried about memory consumption or something. If you just want the data to expire, that would be easy to do.
It is trivial to create such a class.
class CachedObject<TValue>
{
DateTime Date{get;set;}
TimeSpan Duration{get;set;}
TValue Cached{get;set;}
}
class Cache : Dictionary<TKey,TValue>
{
public new TValue this(TKey key)
{
get{
if (ContainsKey(key))
{
var val = base.this[key];
//compare dates
//if expired, remove from cache, return null
//else return the cached item.
}
}
set{//create new CachedObject, set date and timespan, set value, add to dictionary}
}
Its already in the BCL. Its just not where you expect to find it: You can use System.Web.Caching from other kinds of applications too, not only in ASP.NET.
This search on google links to several resources about this.
I don't know of any objects in the BCL which do this, but I have written similar things before.
You can do this fairly easily by just including a System.Threading.Timer inside of your caching class (no web/winforms dependencies), and storing an expiration (or last used) time on your objects. Just have the timer check every few minutes, and remove the objects you want to expire.
However, be watchful of events on your objects. I had a system like this, and was not being very careful to unsubscribe from events on my objects in the cache, which was preventing a subtle, but nasty memeory leak over time. This can be very tricky to debug.
Include an ExpirationDate property in the object that you will be caching (probably a wrapper around your real object) and set it to expire in an hour in its constructor. Instead of removing items from the collection, access the collection through a method that filters out the expired items. Or create a custom collection that does this automatically. If you need to actually remove items from the cache, your custom collection could instead purge expired items on every call to one of its members.