We're running SignalR in a stand-alone ASP.Net app running in a virtual directory off our main ASP.Net website.
In our SignalR hub implementation, we have a static ConcurrentDictionary<int, UserState> variable maintaining some light-weight user state across individual connections. Over time that variable will be added to based upon client-side actions (i.e. as new users start interacting with our website). This variable is essentially providing some simple state tracking across connections.
We don't particularly want to add a special SignalR backplane which would require additional infrastructure dependencies as our data load is likely to be relatively lightweight and tracking this in-memory should be sufficient.
When a user has been inactive for a long-enough period of time (let's say 1 hour) we want to remove them from the dictionary variable. Whatever process does this should be guaranteed to run on a consistent basis - so, not dependent upon user behaviour, but instead upon a timed duration.
I have what I believe to be a good solution for doing this:
public class UserStateService : IUserStateService
{
private static readonly ConcurrentDictionary<int, UserState> recentUsers = new ConcurrentDictionary<int, UserState>();
private static Timer timer;
public static void StartCleanup()
{
timer = new Timer( CleanupRecentUsers, null, 0, 60000 );
}
public static void StopCleanup()
{
timer.Dispose();
}
private static void CleanupRecentUsers( object state )
{
var now = DateTime.UtcNow;
var oldUsers = recentUsers.Select( p => p.Value ).Where( u => u.LastActionTime.AddHours( 1 ) > now );
foreach ( var user in oldUsers )
{
UserState removedUser;
recentUsers.TryRemove( user.UserId, out removedUser );
}
}
// other code for adding/updating user state.
}
As mentioned, I think this is a good solution. However, I'm not very conversant in thread management (though I'm aware that dealing with static objects in ASP.Net is dangerous).
StartCleanup() and StopCleanup() are called once each at the start and end of the application lifecycle, respectively. The UserStateService is supplied to our Hub classes via our IoC container (Structure Map) and is currently not scoped with any special lifecycle handling (i.e. it's not Singleton or thread-scoped, simply per-instance request).
We're already using static concurrent dictionaries in our production app and they're working fine without any known instances of performance issues. What I'm not sure about is running a Timer loop here.
So, my question is, are there any obvious risks here relating to threads being blocked/locked (or CPU use generally going out of control for any reason) that I need to mitigate or which could make this approach unworkable?
There's no particular problem with using a Timer in the way that you suggest.
However, there are a couple of problems with your code.
First, you have:
var oldUsers = recentUsers
.Select( p => p.Value )
.Where( u => u.LastActionTime.AddHours( 1 ) > now );
That will delete any user whose last activity was within the last hour. So anybody you saw a minute ago will be removed. The result is that your recentUsers list will probably be empty most of the time. At best, it will contain users who were last seen at least an hour ago.
I think you want to change that to <. Or, to think about it another way:
.Where((now - u.LastActionTime) > TimeSpan.FromHours(1));
There might also be a race condition in that a user selected for removal might make a request before the removal actually occurs, so you end up removing a user that just made a request. The time window for that race condition is pretty narrow, though, and probably isn't worth worrying about.
Related
I have this static class
static class LocationMemoryCache
{
public static readonly ConcurrentDictionary<int, LocationCityContract> LocationCities = new();
}
My process
Api starts and initializes an empty dictionary
A background job starts and runs once every day to reload the dictionary from the database
Requests come in to read from the dictionary or update a specific city in the dictionary
My problem
If a request comes in to update the city
I update the database
If the update was successful, update the city object in the dictionary
At the same time, the background job started and queried all cities before I updated the specific city
The request finishes and the dictionary city now has the old values because the background job finished last
My solution I thought about first
Is there a way to lock/reserve the concurrent dictionary from reads/writes and then release it when I am done?
This way when the background job starts, it can lock/reserve the dictionary only for itself and when it's done it will release it for other requests to be used.
Then a request might have been waiting for the dictionary to be released and update it with the latest values.
Any ideas on other possible solutions?
Edit
What is the purpose of the background job?
If I manually update/delete something in the database I want those changes to show up after the background job runs again. This could take a day for the changes to show up and I am okay with that.
What happens when the Api wants to access the cache but its not loaded?
When the Api starts I block requests to this particular "Location" project until the background job marks IsReady to true. The cache I implemented is thread safe until I add the background job.
How much time does it take to reload the cache?
I would say less then 10 seconds for a total of 310,000+ records in the "Location" project.
Why I chose the answer
I chose Xerillio's answer because it solves the background job problem by keeping track of date times. Similar to a "object version" approach. I won't be taking this path as I have decided that if I do a manual update in the database, I might as well create an API route that does it for me so that I can update the db and cache at the same time. So I might remove the background job after all or just run it once a week. Thank you for all the answers and I am ok with a possible data inconsistency with the way I am updating the objects because if one route updates 2 specific values and another route updates 2 different specific values then the possibility of having a problem is very minimal
Edit 2
Let's imagine I have this cache now and 10,000 active users
static class LocationMemoryCache
{
public static readonly ConcurrentDictionary<int, LocationCityUserLogContract> LocationCityUserLogs = new();
}
Things I took into consideration
An update will only happen to objects that the user owns and the rate at which the user might update those objects is most likely once every minute. So that reduces the possibility of a problem by a lot for this specific example.
Most of my cache objects are related only to a specific user so it relates with bullet point 1.
The application owns the data, I don't. So I should never manually update the database unless it's critical.
Memory might be a problem but 1,000,000 normalish objects is somewhere between 80MB - 150MB. I can have a lot of objects in memory to gain performance and reduce the load on the database.
Having a lot of objects in memory will put pressure on Garbage Collection and that is not good but I don't think its bad at all for me because Garbage Collection only runs when memory gets low and all I have to do is just plan ahead to make sure there is enough memory. Yes it will run because of day to day operations but it won't be a big impact.
All of these considerations just so that I can have an in memory cache right at my finger tips.
I would suggest adding a UpdatedAt/CreatedAt property to your LocationCityContract or creating a wrapper object (CacheItem<LocationCityContract>) with such a property. That way you can check if the item you're about to add/update with is newer than the existing object like so:
public class CacheItem<T>
{
public T Item { get; }
public DateTime CreatedAt { get; }
// In case of system clock synchronization, consider making CreatedAt
// a long and using Environment.TickCount64. See comment from #Theodor
public CacheItem(T item, DateTime? createdAt = null)
{
Item = item;
CreatedAt = createdAt ?? DateTime.UtcNow;
}
}
// Use it like...
static class LocationMemoryCache
{
public static readonly
ConcurrentDictionary<int, CacheItem<LocationCityContract>> LocationCities = new();
}
// From some request...
var newItem = new CacheItem(newLocation);
// or the background job...
var newItem = new CacheItem(newLocation, updateStart);
LocationMemoryCache.LocationCities
.AddOrUpdate(
newLocation.Id,
newItem,
(_, existingItem) =>
newItem.CreatedAt > existingItem.CreatedAt
? newItem
: existingItem)
);
When a request wants to update the cache entry they do as above with the timestamp of whenever they finished adding the item to the database (see notes below).
The background job should, as soon as it starts, save a timestamp (let's call it updateStart). It then reads everything from the database and adds the items to the cache like above, where CreatedAt for the newLocation is set to updateStart. This way, the background job only updates the cache items that haven't been updated since it started. Perhaps you're not reading all items from DB as the first thing in the background job, but instead you read them one at a time and update the cache accordingly. In that case updateStart should instead be set right before reading each value (we could call it itemReadStart instead).
Since the way of updating the item in the cache is a little more cumbersome and you might be doing it from a lot of places, you could make a helper method to make the call to LocationCities.AddOrUpdate a little easier.
Note:
Since this approach is not synchronizing (locking) updates to the database, there's a race condition that means you might end up with a slightly out-of-date item in the cache. This can happen if two requests wants to update the same item simultaneously. You can't know for sure which one updated the DB last, so even if you set CreatedAt to the timestamp after updating each, it might not truly reflect which one was updated last. Since you're ok with a 24 hour delay from manually updating the DB until the background job updates the cache, perhaps this race condition is not a problem for you as the background job will fix it when run.
As #Theodor mentioned in the comments, you should avoid updating the object from the cache directly. Either use the C# 9 record type (as opposed to a class type) or clone the object if you want to cache new updates. That means, don't use LocationMemoryCache[locationId].Item.CityName = updatedName. Instead you should e.g. clone it like:
// You need to implement a constructor or similar to clone the object
// depending on how complex it is
var newLoc = new LocationCityContract(LocationMemoryCache[locationId].Item);
newLoc.CityName = updatedName;
var newItem = new CacheItem(newLoc);
LocationMemoryCache.LocationCities
.AddOrUpdate(...); /* <- like above */
By not locking the whole dictionary you avoid having requests being blocked by each other because they're trying to update the cache at the same time. If the first point is not acceptable you can also introduce locking based on the location ID (or whatever you call it) when updating the database, so that DB and cache are updated atomically. This avoids blocking requests that are trying to update other locations so you minimize the risk of requests affecting each other.
No, there is no way to lock a ConcurrentDictionary on demand from reads/writes, and then release it when you are done. This class does not offer this functionality. You could manually use a lock every time you are accessing the ConcurrentDictionary, but by doing so you would lose all the advantages that this specialized class has to offer (low contention under heavy usage), while keeping all its disadvantages (awkward API, overhead, allocations).
My suggestion is to use a normal Dictionary protected with a lock. This is a pessimistic approach that will result occasionally to some threads unnecessarily blocked, but it is also very simple and easy to reason about its correctness. Essentially all access to the dictionary and the database will be serialized:
Every time a thread wants to read an object stored in the dictionary, will first have to take the lock, and keep the lock until it's done reading the object.
Every time a thread wants to update the database and then the corresponding object, will first have to take the lock (before even updating the database), and keep the lock until all the properties of the object have been updated.
Every time the background job wants to replace the current dictionary with a new dictionary, will first have to take the lock (before even querying the database), and keep the lock until the new dictionary has taken the place of the old one.
In case the performance of this simple approach proves to be unacceptable, you should look at more sophisticated solutions. But the complexity gap between this solution and the next simplest solution (that also offers guaranteed correctness) is likely to be quite significant, so you'd better have good reasons before going that route.
I am trying to migrate my .Net framework application to .Net Core and in this process, I want to move my in-memory caching from System.Runtime.Caching/MemoryCache to Microsoft.Extensions.Caching.Memory/IMemoryCache. But I have one problem with IMemoryCache, I could not find a way to refresh the cache before it is removed/evicted.
In the case of System.Runtime.Caching/MemoryCache, there is UpdateCallback property in CacheItemPolicy to which I can set the delegate of callback function and this function will be called in a separate thread just before the eviction of the cached object. Even if callback function takes a long time to fetch fresh data, MemoryCache will continue to serve old data beyond its expiry deadline, this ensures my code need not wait for data during the process of cache refresh.
But I don't see such functionality in Microsoft.Extensions.Caching.Memory/IMemoryCache, there is
RegisterPostEvictionCallback property and PostEvictionCallbacks extension method in MemoryCacheEntryOptions. But both of these will be fired after the cache entry is evicted from the cache. So if this callback takes a longer time, all the requests to get this data need to wait.
Is there any solution?
That's because there is no eviction, and, I would argue, that makes IMemoryCache not a cache:
"The ASP.NET Core runtime doesn't trim the cache when system memory is low."
https://learn.microsoft.com/en-us/aspnet/core/performance/caching/memory?view=aspnetcore-5.0#use-setsize-size-and-sizelimit-to-limit-cache-size
"If SizeLimit isn't set, the cache grows without bound."
"The cache size limit does not have a defined unit of measure because the cache has no mechanism to measure the size of entries."
"An entry will not be cached if the sum of the cached entry sizes exceeds the value specified by SizeLimit."
So, not only does the IMemoryCache fail to do the most basic thing you'd expect from a cache - respond to memory pressure by evicting oldest entries - you also don't have the insert logic you expect. Adding a fresh item to a full "cache" doesn't evict an older entry, it refuses to insert the new item.
I argue this is just an unfortunate Dictionary, and not a cache at all. The cake/class is a lie.
To get this to actually work like a cache, you'd need to write a wrapper class that does measure memory size, and system code that interacts with the wrapper that evicts (via .Remove()) in response to memory pressure and expiration, periodically. You know - most of the work of implementing a cache.
So, the reason you couldn't find a way to update before eviction is because by default there isn't any eviction, and if you've implemented your own eviction scheme, you've written so much of an actual cache, what's writing a bit more?
You can do a trick here and add the old cache in RegisterPostEvictionCallback before looking up for the new value. This way if the callback takes a longer time, the old value is still available in cache.
I had this need and I write the class :
public abstract class AutoRefreshCache<TKey, TValue>
{
private readonly ConcurrentDictionary<TKey, TValue> _entries = new ConcurrentDictionary<TKey, TValue>();
protected AutoRefreshCache(TimeSpan interval)
{
var timer = new System.Timers.Timer();
timer.Interval = interval.TotalMilliseconds;
timer.AutoReset = true;
timer.Elapsed += (o, e) =>
{
((System.Timers.Timer)o).Stop();
RefreshAll();
((System.Timers.Timer)o).Start();
};
timer.Start();
}
public TValue Get(TKey key)
{
return _entries.GetOrAdd(key, k => Load(k));
}
public void RefreshAll()
{
var keys = _entries.Keys;
foreach(var key in keys)
{
_entries.AddOrUpdate(key, k => Load(key), (k, v) => Load(key));
}
}
protected abstract TValue Load(TKey key);
}
Values aren't evicted, just refreshed. Only the first Get wait to load the value. During the refresh, Get return the precedent value (no wait).
Example of use :
class Program
{
static void Main(string[] args)
{
var cache = new MyCache();
while (true)
{
System.Threading.Thread.Sleep(TimeSpan.FromSeconds(1));
Console.WriteLine(cache.Get("Key1") ?? "<null>");
}
}
}
public class MyCache : AutoRefreshCache<string, string>
{
public MyCache()
: base(TimeSpan.FromSeconds(5))
{ }
readonly Random random = new Random();
protected override string Load(string key)
{
Console.WriteLine($"Load {key} begin");
System.Threading.Thread.Sleep(TimeSpan.FromSeconds(3));
Console.WriteLine($"Load {key} end");
return "Value " + random.Next();
}
}
Result :
Load Key1 begin
Load Key1 end
Value 1648258406
Load Key1 begin
Value 1648258406
Value 1648258406
Value 1648258406
Load Key1 end
Value 1970225921
Value 1970225921
Value 1970225921
Value 1970225921
Value 1970225921
Load Key1 begin
Value 1970225921
Value 1970225921
Value 1970225921
Load Key1 end
Value 363174357
Value 363174357
You may try to take a look at FusionCache ⚡🦥, a library I recently released.
Features to use
The first interesting thing is that it provides an optimization for concurrent factory calls so that only one call per-key will be exeuted, relieving the load on your data source: basically all concurrent callers for the same cache key at the same time will be blocked and only one factory will be executed.
Then you can specify some timeouts for the factory, so that it will not take too much time: background factory completion isenabled by default so that, even if it will actually times out, it can keep running in the background and update the cache with the new value as soon as it will finish.
Then simply enable fail-safe to re-use the expired value in case of timeouts, or any problem really (the database is down, there are temporary network errors, etc).
A practical example
You can cache something for, let's say, 2 min after which a factory would be called to refresh the data but, in case of problems (exceptions, timeouts, etc), that expired value would be used again until the factory is able to complete in the background, after which it will update the cache right away.
One more thing
Another interesting feature is support for an optional, distributed 2nd level cache, automatically managed and kept in sync with the local one for you without doing anything.
If you will give it a chance please let me know what you think.
/shameless-plug
It looks like you need to set your own ChangeToken for each CacheEntry by calling AddExpirationToken. Then in your implementation of IChangeToken.HasChanged you can have a simple timeout expiration, and right before that gets triggered, asynchronously, you can search for new data that you can add in the cache.
I suggest using the "NeverRemove" priority for cache items and handle cache size and update procedure by methods like "MemoryCache.Compact" if it does not change your current design significantly.
You may find the page "Cache in-memory in ASP.NET Core" useful. Please see the titles of:
- MemoryCache.Compact
- Additional notes: second item
I am tasked with writing a system to process result files created by a different process(which I have no control over) and and trying to modify my code to make use of Parallel.Foreach. The code works fine when just calling a foreach but I have some concerns about thread safety when using the parallel version. The base question I need answered here is "Is the way I am doing this going to guarantee thread safety?" or is this going to cause everything to go sideways on me.
I have tried to make sure all calls are to instances and have removed every static anything except the initial static void Main. It is my current understanding that this will do alot towards assuring thread safety.
I have basically the following, edited for brevity
static void Main(string[] args)
{
MyProcess process = new MyProcess();
process.DoThings();
}
And then in the actual process to do stuff I have
public class MyProcess
{
public void DoThings()
{
//Get some list of things
List<Thing> things = getThings();
Parallel.Foreach(things, item => {
//based on some criteria, take actions from MyActionClass
MyActionClass myAct = new MyActionClass(item);
string tempstring = myAct.DoOneThing();
if(somecondition)
{
MyAct.DoOtherThing();
}
...other similar calls to myAct below here
};
}
}
And over in the MyActionClass I have something like the following:
public class MyActionClass
{
private Thing _thing;
public MyActionClass(Thing item)
{
_thing = item;
}
public string DoOneThing()
{
return _thing.GetSubThings().FirstOrDefault();
}
public void DoOtherThing()
{
_thing.property1 = "Somenewvalue";
}
}
If I can explain this any better I'll try, but I think that's the basics of my needs
EDIT:
Something else I just noticed. If I change the value of a property of the item I'm working with while inside the Parallel.Foreach (in this case, a string value that gets written to a database inside the loop), will that have any affect on the rest of the loop iterations or just the one I'm on? Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case?
There is no shared mutable state between actions in the Parallel.ForEach that I can see, so it should be thread-safe, because at most one thread can touch one object at a time.
But as it has been mentioned there is nothing shared that can be seen. It doesn't mean that in the actual code you use everything is as good as it seems here.
Or that nothing will be changed by you or your coworker that will make some state both shared and mutable (in the Thing, for example), and now you start getting difficult to reproduce crashes at best or just plain wrong behaviour at worst that can be left undetected for a long time.
So, perhaps you should try to go fully immutable near threading code?
Perhaps.
Immutability is good, but it is not a silver bullet, and it is not always easy to use and implement, or that every task can be reasonably expressed through immutable objects. And even that accidental "make shared and mutable" change may happen to it as well, though much less likely.
It should at least be considered as a possible option/alternative.
About the EDIT
If I change the value of a property of the item I'm working with while
inside the Parallel.Foreach (in this case, a string value that gets
written to a database inside the loop), will that have any affect on
the rest of the loop iterations or just the one I'm on?
If you change a property and that object is not used anywhere else, and it doesn't rely on some global mutable state (for example, sort of a public static Int32 ChangesCount that increments with each state change), then you should be safe.
a string value that gets written to a database inside the loop - depending on the used data access technology and how you use it, you may be in trouble, because most of them are not designed for multithreaded environment, like EF DbContext, for example. And obviously do not forget that dealing with concurrent access in database is not always easy, though that is a bit away from our original theme.
Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case - if there is no risk of external concurrent changes, then it is just an unnecessary work. And if there is a chance of another threads(not Parallel.For) making changes to those objects that are being persisted, then you already have bigger problems than Parallel.For.
Objects should always have observable consistent state (unlike when half of properties set by one thread, and half by another, while you try to persist that who-knows-what), and if they are used by many threads, then they should be already thread-safe - there should be no way to put them into inconsistent state.
And if they want to be persisted by external code, such objects should probably provide:
Either SyncRoot property to synchronize property reading code.
Or some current state snapshot DTO that is created internally by some thread-safe method like ThingSnapshot Thing.GetCurrentData() { lock() {} }.
Or something more exotic.
I am working on a project where individual regions of a map are either generated dynamically, or loaded from a file if it has already been generated and saved. Regions are only loaded/generated as needed, and saved and discarded when they aren't anymore.
There are several different tasks that will be using one or more regions of this map for various purposes. For instance, one of these tasks will be to draw all currently visible regions (about 9 at any given time). Another is to get information about, or even modify regions.
The problem is that these tasks may or may not be working with the same regions as other tasks.
Since these regions are rather large, and are costly to generate, it would be problematic (for these and other reasons) to use different copies for each task.
Rather, I think it would be a good idea to create and manage a pool of currently loaded regions. New tasks will first check the pool for their reqired region. They can then use it if it exists, or else create a new one and add it to the pool.
Provided that works, how would I manage this pool? How would I determine if a region is no longer needed by any tasks and can be safely discarded? Am I being silly and overcomplicating this?
I am using c# if that matters to anyone.
Edit:
Now that I'm more awake, would it be as simple as incrementing a counter in each region for each place it's used? then discarding it when the counter reaches 0?
Provided that works, how would I manage this pool? How would I determine if a region is no longer needed by any tasks and can be safely discarded?
A simple way of doing this can be to use weak references:
public class RegionStore
{
// I'm using int as the identifier for a region.
// Obviously this must be some type that can serve as
// an ID according to your application's logic.
private Dictionary<int, WeakReference<Region>> _store = new Dictionary<int, WeakReference<Region>>();
private const int TrimThreshold = 1000; // Profile to find good value here.
private int _addCount = 0;
public bool TryGetRegion(int id, out Region region)
{
WeakReference<Region> wr;
if(!_store.TryGetValue(id, out wr))
return false;
if(wr.TryGetTarget(out region))
return true;
// Clean up space in dictionary.
_store.Remove(id);
return false;
}
public void AddRegion(int id, Region region)
{
if(++_addCount >= TrimThreshold)
Trim();
_store[id] = new WeakReference<Region>(region);
}
public void Remove(int id)
{
_store.Remove(id);
}
private void Trim()
{
// Remove dead keys.
// Profile to test if this is really necessary.
// If you were fully implementing this, rather than delegating to Dictionary,
// you'd likely see if this helped prior to an internal resize.
_addCount = 0;
var keys = _store.Keys.ToList();
Region region;
foreach(int key in keys)
if(!_store[key].TryGetTarget(out wr))
_store.Remove(key);
}
}
Now you've a store of your Region objects, but that store doesn't prevent them being garbage collected if no other references to them exist.
Certain task will be modifying regions. In this case I will likely raise an "update" flag in the region object, and from there update all other tasks using it.
Do note that this will be a definite potential source of bugs in the application as a whole. Mutability complicates any sort of caching. If you can move to a immutable model, it will likely simplify things, but then uses of outdated objects brings its own complications.
ok, i don't know how you have your app designed, but i sugest you to have a look at this
You can also use static to share you variable with other tasks but then you may want to use block variables to prevent you to write or read from that variable while other processes are using it. (here)
I am writing a web service that allows users to create jobs within the system. Each user has an allowance of the number of jobs they can create. I have a method which checks that the user has some remaining credits which looks like this:
private bool CheckRemainingCreditsForUser(string userId)
{
lock(lockObj)
{
var user = GetUserFromDB(userId);
if (user.RemaingCredit == 0) return false;
RemoveOneCreditFromUser(user);
SaveUserToDB(user);
}
}
The problem I can see with this approach is that if multiple different users make a request at the same time they will only get processed one at a time which could cause performance issues to the client. Would it be possible to do something like this?
private bool CheckRemainingCreditsForUser(string userId)
{
//If there is a current lock on the value of userId then wait
//If not get a lock on the value of userId
var user = GetUserFromDB(userId);
if (user.RemaingCredit == 0) return false;
RemoveOneCreditFromUser(user);
SaveUserToDB(user);
//Release lock on the value of userId
}
This would mean that requests with different userIds could be processed at the same time, but requests with the same userId would have to wait for the previous request to finish
Yes, you could do that with a Dictionary<string, object>. To link a lockObject to every userId.
The problem would be cleaning up that Dictionary every so often.
But I would verify first that there really is a bottleneck here. Don't fix problems you don't have.
The alternative is to have a (optimistic) concurrency check in your db and just handle the (rare) conflict cases.
Instead of locking in every methods, why aren't you using a Singleton that will manage the User's rights ?
It will be responsible from giving the remaining allowances AND manage them at the same time without loosing the thread-safe code.
By the way, a method named CheckRemainingCreditsForUser should not remove allowances since the name isn't implying it, you may be the only developer on this project but it won't hurt to make 2 methods to manage this for re-useability and code comprehension.
EDIT : And this object should also hold the Users dictionary