C# Collection whose items expire - c#

I am writing a Console Application in C# in which I want to cache certain items for a predefined time (let's say 1 hour). I want items that have been added into this cache to be automatically removed after they expire. Is there a built-in data structure that I can use? Remember this is a Console App not a web app.

Do you actually need them removed from the cache at that time? Or just that future requests to the cache for that item should return null after a given time?
To do the former, you would need some sort of background thread that was periodically purging the cache. This would only be needed if you were worried about memory consumption or something. If you just want the data to expire, that would be easy to do.
It is trivial to create such a class.
class CachedObject<TValue>
{
DateTime Date{get;set;}
TimeSpan Duration{get;set;}
TValue Cached{get;set;}
}
class Cache : Dictionary<TKey,TValue>
{
public new TValue this(TKey key)
{
get{
if (ContainsKey(key))
{
var val = base.this[key];
//compare dates
//if expired, remove from cache, return null
//else return the cached item.
}
}
set{//create new CachedObject, set date and timespan, set value, add to dictionary}
}

Its already in the BCL. Its just not where you expect to find it: You can use System.Web.Caching from other kinds of applications too, not only in ASP.NET.
This search on google links to several resources about this.

I don't know of any objects in the BCL which do this, but I have written similar things before.
You can do this fairly easily by just including a System.Threading.Timer inside of your caching class (no web/winforms dependencies), and storing an expiration (or last used) time on your objects. Just have the timer check every few minutes, and remove the objects you want to expire.
However, be watchful of events on your objects. I had a system like this, and was not being very careful to unsubscribe from events on my objects in the cache, which was preventing a subtle, but nasty memeory leak over time. This can be very tricky to debug.

Include an ExpirationDate property in the object that you will be caching (probably a wrapper around your real object) and set it to expire in an hour in its constructor. Instead of removing items from the collection, access the collection through a method that filters out the expired items. Or create a custom collection that does this automatically. If you need to actually remove items from the cache, your custom collection could instead purge expired items on every call to one of its members.

Related

Is there a way to lock a concurrent dictionary from being used

I have this static class
static class LocationMemoryCache
{
public static readonly ConcurrentDictionary<int, LocationCityContract> LocationCities = new();
}
My process
Api starts and initializes an empty dictionary
A background job starts and runs once every day to reload the dictionary from the database
Requests come in to read from the dictionary or update a specific city in the dictionary
My problem
If a request comes in to update the city
I update the database
If the update was successful, update the city object in the dictionary
At the same time, the background job started and queried all cities before I updated the specific city
The request finishes and the dictionary city now has the old values because the background job finished last
My solution I thought about first
Is there a way to lock/reserve the concurrent dictionary from reads/writes and then release it when I am done?
This way when the background job starts, it can lock/reserve the dictionary only for itself and when it's done it will release it for other requests to be used.
Then a request might have been waiting for the dictionary to be released and update it with the latest values.
Any ideas on other possible solutions?
Edit
What is the purpose of the background job?
If I manually update/delete something in the database I want those changes to show up after the background job runs again. This could take a day for the changes to show up and I am okay with that.
What happens when the Api wants to access the cache but its not loaded?
When the Api starts I block requests to this particular "Location" project until the background job marks IsReady to true. The cache I implemented is thread safe until I add the background job.
How much time does it take to reload the cache?
I would say less then 10 seconds for a total of 310,000+ records in the "Location" project.
Why I chose the answer
I chose Xerillio's answer because it solves the background job problem by keeping track of date times. Similar to a "object version" approach. I won't be taking this path as I have decided that if I do a manual update in the database, I might as well create an API route that does it for me so that I can update the db and cache at the same time. So I might remove the background job after all or just run it once a week. Thank you for all the answers and I am ok with a possible data inconsistency with the way I am updating the objects because if one route updates 2 specific values and another route updates 2 different specific values then the possibility of having a problem is very minimal
Edit 2
Let's imagine I have this cache now and 10,000 active users
static class LocationMemoryCache
{
public static readonly ConcurrentDictionary<int, LocationCityUserLogContract> LocationCityUserLogs = new();
}
Things I took into consideration
An update will only happen to objects that the user owns and the rate at which the user might update those objects is most likely once every minute. So that reduces the possibility of a problem by a lot for this specific example.
Most of my cache objects are related only to a specific user so it relates with bullet point 1.
The application owns the data, I don't. So I should never manually update the database unless it's critical.
Memory might be a problem but 1,000,000 normalish objects is somewhere between 80MB - 150MB. I can have a lot of objects in memory to gain performance and reduce the load on the database.
Having a lot of objects in memory will put pressure on Garbage Collection and that is not good but I don't think its bad at all for me because Garbage Collection only runs when memory gets low and all I have to do is just plan ahead to make sure there is enough memory. Yes it will run because of day to day operations but it won't be a big impact.
All of these considerations just so that I can have an in memory cache right at my finger tips.
I would suggest adding a UpdatedAt/CreatedAt property to your LocationCityContract or creating a wrapper object (CacheItem<LocationCityContract>) with such a property. That way you can check if the item you're about to add/update with is newer than the existing object like so:
public class CacheItem<T>
{
public T Item { get; }
public DateTime CreatedAt { get; }
// In case of system clock synchronization, consider making CreatedAt
// a long and using Environment.TickCount64. See comment from #Theodor
public CacheItem(T item, DateTime? createdAt = null)
{
Item = item;
CreatedAt = createdAt ?? DateTime.UtcNow;
}
}
// Use it like...
static class LocationMemoryCache
{
public static readonly
ConcurrentDictionary<int, CacheItem<LocationCityContract>> LocationCities = new();
}
// From some request...
var newItem = new CacheItem(newLocation);
// or the background job...
var newItem = new CacheItem(newLocation, updateStart);
LocationMemoryCache.LocationCities
.AddOrUpdate(
newLocation.Id,
newItem,
(_, existingItem) =>
newItem.CreatedAt > existingItem.CreatedAt
? newItem
: existingItem)
);
When a request wants to update the cache entry they do as above with the timestamp of whenever they finished adding the item to the database (see notes below).
The background job should, as soon as it starts, save a timestamp (let's call it updateStart). It then reads everything from the database and adds the items to the cache like above, where CreatedAt for the newLocation is set to updateStart. This way, the background job only updates the cache items that haven't been updated since it started. Perhaps you're not reading all items from DB as the first thing in the background job, but instead you read them one at a time and update the cache accordingly. In that case updateStart should instead be set right before reading each value (we could call it itemReadStart instead).
Since the way of updating the item in the cache is a little more cumbersome and you might be doing it from a lot of places, you could make a helper method to make the call to LocationCities.AddOrUpdate a little easier.
Note:
Since this approach is not synchronizing (locking) updates to the database, there's a race condition that means you might end up with a slightly out-of-date item in the cache. This can happen if two requests wants to update the same item simultaneously. You can't know for sure which one updated the DB last, so even if you set CreatedAt to the timestamp after updating each, it might not truly reflect which one was updated last. Since you're ok with a 24 hour delay from manually updating the DB until the background job updates the cache, perhaps this race condition is not a problem for you as the background job will fix it when run.
As #Theodor mentioned in the comments, you should avoid updating the object from the cache directly. Either use the C# 9 record type (as opposed to a class type) or clone the object if you want to cache new updates. That means, don't use LocationMemoryCache[locationId].Item.CityName = updatedName. Instead you should e.g. clone it like:
// You need to implement a constructor or similar to clone the object
// depending on how complex it is
var newLoc = new LocationCityContract(LocationMemoryCache[locationId].Item);
newLoc.CityName = updatedName;
var newItem = new CacheItem(newLoc);
LocationMemoryCache.LocationCities
.AddOrUpdate(...); /* <- like above */
By not locking the whole dictionary you avoid having requests being blocked by each other because they're trying to update the cache at the same time. If the first point is not acceptable you can also introduce locking based on the location ID (or whatever you call it) when updating the database, so that DB and cache are updated atomically. This avoids blocking requests that are trying to update other locations so you minimize the risk of requests affecting each other.
No, there is no way to lock a ConcurrentDictionary on demand from reads/writes, and then release it when you are done. This class does not offer this functionality. You could manually use a lock every time you are accessing the ConcurrentDictionary, but by doing so you would lose all the advantages that this specialized class has to offer (low contention under heavy usage), while keeping all its disadvantages (awkward API, overhead, allocations).
My suggestion is to use a normal Dictionary protected with a lock. This is a pessimistic approach that will result occasionally to some threads unnecessarily blocked, but it is also very simple and easy to reason about its correctness. Essentially all access to the dictionary and the database will be serialized:
Every time a thread wants to read an object stored in the dictionary, will first have to take the lock, and keep the lock until it's done reading the object.
Every time a thread wants to update the database and then the corresponding object, will first have to take the lock (before even updating the database), and keep the lock until all the properties of the object have been updated.
Every time the background job wants to replace the current dictionary with a new dictionary, will first have to take the lock (before even querying the database), and keep the lock until the new dictionary has taken the place of the old one.
In case the performance of this simple approach proves to be unacceptable, you should look at more sophisticated solutions. But the complexity gap between this solution and the next simplest solution (that also offers guaranteed correctness) is likely to be quite significant, so you'd better have good reasons before going that route.

timing how long an element has been in a list

What would be the easiest way to track how long an element has been part of a list? For instance, I would like to pop an element from a list after it has been added for 2 minutes.
Would I have to create two lists, one holding the actual element and the other the time that element was added to the list? Then checking the "time" list in order to know when it has reached two minutes?
I have a feeling theres a much simpler and efficient method to do this but I cannot think of it at the moment...
If you want to have the minimum amount of code to write, you can have a look at the MemoryCache class, which implements an expiration policy.
Using the CacheItemPolicy you can even have a callback method executed when the item is removed after expiration.
Rather than storing the elements in the lists directly, you could use a wrapper class which included the element and its storage time, then store instances of the wrapper class instead.
You would probably want to use a queue rather than a list; you will be removing items from the front a lot, which is far more efficient with a queue than with a list.
How often you check the queue is something you'd need to decide on. You could possibly use a separate thread to check every so often, in which case you'd probably want to use a ConcurrentQueue<T>

Memory management / caching for costly objects in C#

Assume that I have the following object
public class MyClass
{
public ReadOnlyDictionary<T, V> Dict
{
get
{
return createDictionary();
}
}
}
Assume that ReadOnlyDictionary is a read-only wrapper around Dictionary<T, V>.
The createDictionary method takes significant time to complete and returned dictionary is relatively large.
Obviously, I want to implement some sort of caching so I could reuse result of createDictionary but also I do not want to abuse garbage collector and use to much memory.
I thought of using WeakReference for the dictionary but not sure if this is best approach.
What would you recommend? How to properly handle result of a costly method that might be called multiple times?
UPDATE:
I am interested in an advice for a C# 2.0 library (single DLL, non-visual). The library might be used in a desktop of a web application.
UPDATE 2:
The question is relevant for read-only objects as well. I changed value of the property from Dictionary to ReadOnlyDictionary.
UPDATE 3:
The T is relatively simple type (string, for example). The V is a custom class. You might assume that an instance of V is costly to create. The dictionary might contain from 0 to couple of thousands elements.
The code assumed to be accessed from a single thread or from multiple threads with an external synchronization mechanism.
I am fine if the dictionary is GC-ed when no one uses it. I am trying to find a balance between time (I want to somehow cache the result of createDictionary) and memory expenses (I do not want to keep memory occupied longer than necessary).
WeakReference is not a good solution for a cache since you object won´t survive the next GC if nobody else is referencing your dictionary. You can make a simple cache by storing the created value in a member variable and reuse it if it is not null.
This is not thread safe and you would end up in some situations creating the dictionary several times if you have heavy concurent access to it. You can use the double checked lock pattern to guard against this with minimal perf impact.
To help you further you would need to specify if concurrent access is an issue for you and how much memory your dictionary does consume and how it is created. If e.g. the dictionary is the result of an expensive query it might help to simply serialize the dictionary to disc and reuse it until you need to recreate it (this depends on your specific needs).
Caching is another word for memory leak if you have no clear policy when your object should be removed from the cache. Since you are trying WeakReference I assume you do not know when exactly a good time would be to clear the cache.
Another option is to compress the dictionary into a less memory hungry structure. How many keys does your dictionary has and what are the values?
There are four major mechanisms available for you (Lazy comes in 4.0, so it is no option)
lazy initialization
virtual proxy
ghost
value holder
each has it own advantages.
i suggest a value holder, which populates the dictionary on the first call of the GetValue
method of the holder. then you can use that value as long as you want to AND it is only
done once AND it is only done when in need.
for more information, see martin fowlers page
Are you sure you need to cache the entire dictionary?
From what you say, it might be better to keep a Most-Recently-Used list of key-value pairs.
If the key is found in the list, just return the value.
If it is not, create the one value (which is supposedly faster than creating all of them, and using less memory too) and store it in the list, thereby removing the key-value pair that hasn't been used the longest.
Here's a very simple MRU list implementation, it might serve as inspiration:
using System.Collections.Generic;
using System.Linq;
internal sealed class MostRecentlyUsedList<T> : IEnumerable<T>
{
private readonly List<T> items;
private readonly int maxCount;
public MostRecentlyUsedList(int maxCount, IEnumerable<T> initialData)
: this(maxCount)
{
this.items.AddRange(initialData.Take(maxCount));
}
public MostRecentlyUsedList(int maxCount)
{
this.maxCount = maxCount;
this.items = new List<T>(maxCount);
}
/// <summary>
/// Adds an item to the top of the most recently used list.
/// </summary>
/// <param name="item">The item to add.</param>
/// <returns><c>true</c> if the list was updated, <c>false</c> otherwise.</returns>
public bool Add(T item)
{
int index = this.items.IndexOf(item);
if (index != 0)
{
// item is not already the first in the list
if (index > 0)
{
// item is in the list, but not in the first position
this.items.RemoveAt(index);
}
else if (this.items.Count >= this.maxCount)
{
// item is not in the list, and the list is full already
this.items.RemoveAt(this.items.Count - 1);
}
this.items.Insert(0, item);
return true;
}
else
{
return false;
}
}
public IEnumerator<T> GetEnumerator()
{
return this.items.GetEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
In your case, T is a key-value pair. Keep maxcount small enough, so that searching stays fast, and to avoid excessive memory usage. Call Add each time you use an item.
An application should use WeakReference as a caching mechanism if the useful lifetime of an object's presence in the cache will be comparable to reference lifetime of the object. Suppose, for example, that you have a method which will create a ReadOnlyDictionary based on deserializing a String. If a common usage pattern would be to read a string, create a dictionary, do some stuff with it, abandon it, and start again with another string, WeakReference is probably not ideal. On the other hand, if your objective is to deserialize many strings (quite a few of which will be equal) into ReadOnlyDictionary instances, it may be very useful if repeated attempts to deserialize the same string yield the same instance. Note that the savings would not just come from the fact that one only had to do the work of building the instance once, but also from the facts that (1) it would not be necessary to keep multiple instances in memory, and (2) if ReadOnlyDictionary variables refer to the same instance, they can be known to be equivalent without having to examine the instances themselves. By contrast, determining whether two distinct ReadOnlyDictionary instances were equivalent might require examining all the items in each. Code which would have to do many such comparisons could benefit from using a WeakReference cache so that variables which hold equivalent instances would usually hold the same instance.
I think you have two mechanisms you can rely on for caching, instead of developing your own. The first, as you yourself suggested, was to use a WeakReference, and to let the garbage collector decide when to free this memory up.
You have a second mechanism - memory paging. If the dictionary is created in one swoop, it'll probably be stored in a more or less continuous part of the heap. Just keep the dictionary alive, and let Windows page it out to the swap file if you don't need it. Depending on your usage (how random is your dictionary access), you may end up with better performance than the WeakReference.
This second approach is problematic if you're close to your address space limits (this happens only in 32-bit processes).

Webservice Method generic list fill and keep for 1 day

I have a webmethod inside a webservice that calls another webservice to get data and fills a generic list then it returns it, what i want to do is to save the list in memory, so the next time the webmethod is invoked it does not hit the other webservice but just returns the list, i have tried but when i invoke the web method for the second time the list count shows as 0, looks like garbage collection is cleaning all. any suggestions ?
Store it in the ASP.NET cache. Setting an absolute expiration of midnight should assure that you only get it once per day (unless it gets tossed from the cache due to space issues).
[Web Method]
public List<Foo> GetFoos()
{
var foos = Cache["FooList"] as List<Foo>;
if (foos == null)
{
... get foos from remote web service ...
var expiration = DateTime.Today.AddHours(7);
if (DateTime.Now >= expiration)
{
expiration = expiration.AddDays(1);
}
Cache.Insert( "FooList", foos, null, expiration, Cache.NoSlidingExpiration );
}
return foos;
}
Note: you could also use output caching as well, but you're limited to a sliding expiration. That is, it will be cached for a duration based on when the request occurs. It's not clear that's what you want. For example, what if the first request occurs at 11pm with a 24 hour duration, you wouldn't check again until 11pm the next day. If you have data changing on a daily basis, you're better off using the ASP.NET cache in conjunction with output caching on a shorter duration to ensure that you get the latest, daily data in a timely fashion.
Updated example based on comments.
It sounds to me like your list might either not be static, or it might constantly be new'd within a non-static constructor. There are three possible fixes for this:
Make sure that your generic list is a static property which only get initialised within a static constructor.
Seeing your time requirements I would also suggest potentially looking into MemoryCache or Cache.
Use the WebMethod attribute and set a CacheDuration (i.e: [WebMethod(CacheDuration=86400)])
I have not tried this on a webservice, but I think output cashing would work.
[WebMethod(CacheDuration=86400)]
public string FunctionName(string Name)
{
...code...
return(sb.ToString());
}
Read: How to perform output caching with Web services in Visual C# .NET

C#: How to implement a smart cache

I have some places where implementing some sort of cache might be useful. For example in cases of doing resource lookups based on custom strings, finding names of properties using reflection, or to have only one PropertyChangedEventArgs per property name.
A simple example of the last one:
public static class Cache
{
private static Dictionary<string, PropertyChangedEventArgs> cache;
static Cache()
{
cache = new Dictionary<string, PropertyChangedEventArgs>();
}
public static PropertyChangedEventArgs GetPropertyChangedEventArgs(
string propertyName)
{
if (cache.ContainsKey(propertyName))
return cache[propertyName];
return cache[propertyName] = new PropertyChangedEventArgs(propertyName);
}
}
But, will this work well? For example if we had a whole load of different propertyNames, that would mean we would end up with a huge cache sitting there never being garbage collected or anything. I'm imagining if what is cached are larger values and if the application is a long-running one, this might end up as kind of a problem... or what do you think? How should a good cache be implemented? Is this one good enough for most purposes? Any examples of some nice cache implementations that are not too hard to understand or way too complex to implement?
This is a large problem, you need to determine the domain of the problem and apply the correct techniques. For instance, how would you describe the expiration of the objects? Do they become stale over a fixed interval of time? Do they become stale from an external event? How frequently does this happen? Additionally, how many objects do you have? Finally, how much does it cost to generate the object?
The simplest strategy would be to do straight memoization, as you have above. This assumes that objects never expire, and that there are not so many as to run your memory dry and that you think the cost to create these objects warrants the use of a cache to begin with.
The next layer might be to limit the number of objects, and use an implicit expiration policy, such as LRU (least recently used). To do this you'd typically use a doubly linked list in addition to your dictionary, and every time an objects is accessed it is moved to the front of the list. Then, if you need to add a new object, but it is over your limit of total objects, you'd remove from the back of the list.
Next, you might need to enforce explicit expiration, either based on time, or some external stimulus. This would require you to have some sort of expiration event that could be called.
As you can see there is alot of design in caching, so you need to understand your domain and engineer appropriately. You did not provide enough detail for me to discuss specifics, I felt.
P.S. Please consider using Generics when defining your class so that many types of objects can be stored, thus allowing your caching code to be reused.
You could wrap each of your cached items in a WeakReference. This would allow the GC to reclaim items if-and-when required, however it doesn't give you any granular control of when items will disappear from the cache, or allow you to implement explicit expiration policies etc.
(Ha! I just noticed that the example given on the MSDN page is a simple caching class.)
Looks like .NET 4.0 now supports System.Runtime.Caching for caching many types of things. You should look into that first, instead of re-inventing the wheel. More details:
http://msdn.microsoft.com/en-us/library/system.runtime.caching%28VS.100%29.aspx
This is a nice debate to have, but depending your application, here's some tips:
You should define the max size of the cache, what to do with old items if your cache is full, have a scavenging strategy, determine a time to live of the object in the cache, does your cache can/must be persisted somewhere else that memory, in case of application abnormal termination, ...
This is a common problem that has many solutions depending on your application need.
It is so common that Microsoft released a whole library to address it.
You should check out Microsoft Velocity before rolling up your own cache.
http://msdn.microsoft.com/en-us/data/cc655792.aspx
Hope this help.
You could use a WeakReference but if your object is not that large than don't because the WeakReference would be taking more memory than the object itself which is not a good technique. Also, if the object is a short-time usage where it will never make it to generation 1 from generation 0 on the GC, there is not much need for the WeakReference but IDisposable interface on the object would have with the release on SuppressFinalize.
If you want to control the lifetime you need a timer to update the datetime/ timespan again the desiredExpirationTime on the object in your cache.
The important thing is if the object is large then opt for the WeakReference else use the strong reference. Also, you can set the capacity on the Dictionary and create a queue for requesting additional objects in your temp bin serializing the object and loading it when there is room in the Dictionary, then clear it from the temp directory.

Categories