I have this static class
static class LocationMemoryCache
{
public static readonly ConcurrentDictionary<int, LocationCityContract> LocationCities = new();
}
My process
Api starts and initializes an empty dictionary
A background job starts and runs once every day to reload the dictionary from the database
Requests come in to read from the dictionary or update a specific city in the dictionary
My problem
If a request comes in to update the city
I update the database
If the update was successful, update the city object in the dictionary
At the same time, the background job started and queried all cities before I updated the specific city
The request finishes and the dictionary city now has the old values because the background job finished last
My solution I thought about first
Is there a way to lock/reserve the concurrent dictionary from reads/writes and then release it when I am done?
This way when the background job starts, it can lock/reserve the dictionary only for itself and when it's done it will release it for other requests to be used.
Then a request might have been waiting for the dictionary to be released and update it with the latest values.
Any ideas on other possible solutions?
Edit
What is the purpose of the background job?
If I manually update/delete something in the database I want those changes to show up after the background job runs again. This could take a day for the changes to show up and I am okay with that.
What happens when the Api wants to access the cache but its not loaded?
When the Api starts I block requests to this particular "Location" project until the background job marks IsReady to true. The cache I implemented is thread safe until I add the background job.
How much time does it take to reload the cache?
I would say less then 10 seconds for a total of 310,000+ records in the "Location" project.
Why I chose the answer
I chose Xerillio's answer because it solves the background job problem by keeping track of date times. Similar to a "object version" approach. I won't be taking this path as I have decided that if I do a manual update in the database, I might as well create an API route that does it for me so that I can update the db and cache at the same time. So I might remove the background job after all or just run it once a week. Thank you for all the answers and I am ok with a possible data inconsistency with the way I am updating the objects because if one route updates 2 specific values and another route updates 2 different specific values then the possibility of having a problem is very minimal
Edit 2
Let's imagine I have this cache now and 10,000 active users
static class LocationMemoryCache
{
public static readonly ConcurrentDictionary<int, LocationCityUserLogContract> LocationCityUserLogs = new();
}
Things I took into consideration
An update will only happen to objects that the user owns and the rate at which the user might update those objects is most likely once every minute. So that reduces the possibility of a problem by a lot for this specific example.
Most of my cache objects are related only to a specific user so it relates with bullet point 1.
The application owns the data, I don't. So I should never manually update the database unless it's critical.
Memory might be a problem but 1,000,000 normalish objects is somewhere between 80MB - 150MB. I can have a lot of objects in memory to gain performance and reduce the load on the database.
Having a lot of objects in memory will put pressure on Garbage Collection and that is not good but I don't think its bad at all for me because Garbage Collection only runs when memory gets low and all I have to do is just plan ahead to make sure there is enough memory. Yes it will run because of day to day operations but it won't be a big impact.
All of these considerations just so that I can have an in memory cache right at my finger tips.
I would suggest adding a UpdatedAt/CreatedAt property to your LocationCityContract or creating a wrapper object (CacheItem<LocationCityContract>) with such a property. That way you can check if the item you're about to add/update with is newer than the existing object like so:
public class CacheItem<T>
{
public T Item { get; }
public DateTime CreatedAt { get; }
// In case of system clock synchronization, consider making CreatedAt
// a long and using Environment.TickCount64. See comment from #Theodor
public CacheItem(T item, DateTime? createdAt = null)
{
Item = item;
CreatedAt = createdAt ?? DateTime.UtcNow;
}
}
// Use it like...
static class LocationMemoryCache
{
public static readonly
ConcurrentDictionary<int, CacheItem<LocationCityContract>> LocationCities = new();
}
// From some request...
var newItem = new CacheItem(newLocation);
// or the background job...
var newItem = new CacheItem(newLocation, updateStart);
LocationMemoryCache.LocationCities
.AddOrUpdate(
newLocation.Id,
newItem,
(_, existingItem) =>
newItem.CreatedAt > existingItem.CreatedAt
? newItem
: existingItem)
);
When a request wants to update the cache entry they do as above with the timestamp of whenever they finished adding the item to the database (see notes below).
The background job should, as soon as it starts, save a timestamp (let's call it updateStart). It then reads everything from the database and adds the items to the cache like above, where CreatedAt for the newLocation is set to updateStart. This way, the background job only updates the cache items that haven't been updated since it started. Perhaps you're not reading all items from DB as the first thing in the background job, but instead you read them one at a time and update the cache accordingly. In that case updateStart should instead be set right before reading each value (we could call it itemReadStart instead).
Since the way of updating the item in the cache is a little more cumbersome and you might be doing it from a lot of places, you could make a helper method to make the call to LocationCities.AddOrUpdate a little easier.
Note:
Since this approach is not synchronizing (locking) updates to the database, there's a race condition that means you might end up with a slightly out-of-date item in the cache. This can happen if two requests wants to update the same item simultaneously. You can't know for sure which one updated the DB last, so even if you set CreatedAt to the timestamp after updating each, it might not truly reflect which one was updated last. Since you're ok with a 24 hour delay from manually updating the DB until the background job updates the cache, perhaps this race condition is not a problem for you as the background job will fix it when run.
As #Theodor mentioned in the comments, you should avoid updating the object from the cache directly. Either use the C# 9 record type (as opposed to a class type) or clone the object if you want to cache new updates. That means, don't use LocationMemoryCache[locationId].Item.CityName = updatedName. Instead you should e.g. clone it like:
// You need to implement a constructor or similar to clone the object
// depending on how complex it is
var newLoc = new LocationCityContract(LocationMemoryCache[locationId].Item);
newLoc.CityName = updatedName;
var newItem = new CacheItem(newLoc);
LocationMemoryCache.LocationCities
.AddOrUpdate(...); /* <- like above */
By not locking the whole dictionary you avoid having requests being blocked by each other because they're trying to update the cache at the same time. If the first point is not acceptable you can also introduce locking based on the location ID (or whatever you call it) when updating the database, so that DB and cache are updated atomically. This avoids blocking requests that are trying to update other locations so you minimize the risk of requests affecting each other.
No, there is no way to lock a ConcurrentDictionary on demand from reads/writes, and then release it when you are done. This class does not offer this functionality. You could manually use a lock every time you are accessing the ConcurrentDictionary, but by doing so you would lose all the advantages that this specialized class has to offer (low contention under heavy usage), while keeping all its disadvantages (awkward API, overhead, allocations).
My suggestion is to use a normal Dictionary protected with a lock. This is a pessimistic approach that will result occasionally to some threads unnecessarily blocked, but it is also very simple and easy to reason about its correctness. Essentially all access to the dictionary and the database will be serialized:
Every time a thread wants to read an object stored in the dictionary, will first have to take the lock, and keep the lock until it's done reading the object.
Every time a thread wants to update the database and then the corresponding object, will first have to take the lock (before even updating the database), and keep the lock until all the properties of the object have been updated.
Every time the background job wants to replace the current dictionary with a new dictionary, will first have to take the lock (before even querying the database), and keep the lock until the new dictionary has taken the place of the old one.
In case the performance of this simple approach proves to be unacceptable, you should look at more sophisticated solutions. But the complexity gap between this solution and the next simplest solution (that also offers guaranteed correctness) is likely to be quite significant, so you'd better have good reasons before going that route.
Related
I have ASP.NET Core application which communicates with web api.
There is an business operation involving a few steps (do sth on one page, go to next and next). This multi-step operation is being done in context of one element.
So let's say you have a list of some business objects and your task is to accept object 3 from this list. Accepting is our multistep operation and if I am currently accepting object 3 no one else should be able to enter accepting operation for object 3. When I finish operation, it should be unlocked.
Hope the problem is understandable.
We don't want very time-consuming solution, the simplest idea was to create a database table which indicates when user starts operation, it saves id of the object and id of the user and automatcally remove itself after for example 5 minutes, if someone else want to access operation we check if it is blocked for this object. But it is kind of hacky and not very clean (what if user will go for a coffee and continue operation after 10 minutes?)
I'm looking for a better way to implement this kind of behaviour and appreciate any ideas
If I were to implement that behavior, I'll also use database, but kinda different way. I'll make a table of objects (object 3 is one of its row), adding a column for UserId, boolean OnProcess (to mark if the object is on process or not) and timestamp for StartProcess.
For a user to be able to enter the operation, run query like:
UPDATE Objects SET UserId = <CurrentUser>, StartProcess = <NOW>, OnProcess = true
OUTPUT Object.Id
WHERE Object.Id == 3 AND
(
OnProcess == false
OR ( OnProcess == true AND UserId == <CurrentUser> )
OR ( StartProcess <is more than 15 minutes ago>)
)
disclaimer: the query above is not an executable query, but it should be clear enough to understand what it does.
With the query above, the Object.Id will be returned when:
the object is not being processed by another user
the object is being processed by the CurrentUser itself, also resetting the StartProcess (some kind of sliding behavior). This way, if CurrentUser AFK for a given time (but not exceeding the threshold time) and comes back, he/she can comfortably continue the operation
the object is not being processed for the last 15 minutes. This is actually the threshold that I mention in previous point. As for how long (15 minutes in my example), it is really up to you.
If the Object.Id is returned for a user, then he/she are able to enter the operation.
You're looking for a semaphore. The lock keyword is the most basic of semaphores, but you can also use Semaphore/SemaphoreSlim, which provide the ability to do things like rate-limiting, whereas lock will literally gate one op at a time. However, your goal is to gate one op at a time, for a particular resource, which makes SemaphoreSlim a better choice, specifically a ConcurrentDictionary<string, SemaphoreSlim>.
You'll need a class with singleton lifetime (one instance for the entire life of the application). There, you'll add an ivar:
private readonly ConcurrentDictionary<string, SemaphoreSlim> _semaphores = new ConcurrentDictionary<string, SemaphoreSlim>();
Then, you'll add the following code around the operation you want to gate:
var semaphore = _semaphores.GetOrAdd("object3", _ => new SemaphoreSlim(1, 1));
await semaphore.WaitAsync();
// do something
semaphore.Release();
The "object3" there is obviously just a placeholder. You'll want to use whatever makes sense (ID, etc.) - something that uniquely identifies the particular resource you're gating. This then will only hold operations for that particular resource if there's an existing operation on that particular resource. A different resource would get its own semaphore and thus its own gate.
I am tasked with writing a system to process result files created by a different process(which I have no control over) and and trying to modify my code to make use of Parallel.Foreach. The code works fine when just calling a foreach but I have some concerns about thread safety when using the parallel version. The base question I need answered here is "Is the way I am doing this going to guarantee thread safety?" or is this going to cause everything to go sideways on me.
I have tried to make sure all calls are to instances and have removed every static anything except the initial static void Main. It is my current understanding that this will do alot towards assuring thread safety.
I have basically the following, edited for brevity
static void Main(string[] args)
{
MyProcess process = new MyProcess();
process.DoThings();
}
And then in the actual process to do stuff I have
public class MyProcess
{
public void DoThings()
{
//Get some list of things
List<Thing> things = getThings();
Parallel.Foreach(things, item => {
//based on some criteria, take actions from MyActionClass
MyActionClass myAct = new MyActionClass(item);
string tempstring = myAct.DoOneThing();
if(somecondition)
{
MyAct.DoOtherThing();
}
...other similar calls to myAct below here
};
}
}
And over in the MyActionClass I have something like the following:
public class MyActionClass
{
private Thing _thing;
public MyActionClass(Thing item)
{
_thing = item;
}
public string DoOneThing()
{
return _thing.GetSubThings().FirstOrDefault();
}
public void DoOtherThing()
{
_thing.property1 = "Somenewvalue";
}
}
If I can explain this any better I'll try, but I think that's the basics of my needs
EDIT:
Something else I just noticed. If I change the value of a property of the item I'm working with while inside the Parallel.Foreach (in this case, a string value that gets written to a database inside the loop), will that have any affect on the rest of the loop iterations or just the one I'm on? Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case?
There is no shared mutable state between actions in the Parallel.ForEach that I can see, so it should be thread-safe, because at most one thread can touch one object at a time.
But as it has been mentioned there is nothing shared that can be seen. It doesn't mean that in the actual code you use everything is as good as it seems here.
Or that nothing will be changed by you or your coworker that will make some state both shared and mutable (in the Thing, for example), and now you start getting difficult to reproduce crashes at best or just plain wrong behaviour at worst that can be left undetected for a long time.
So, perhaps you should try to go fully immutable near threading code?
Perhaps.
Immutability is good, but it is not a silver bullet, and it is not always easy to use and implement, or that every task can be reasonably expressed through immutable objects. And even that accidental "make shared and mutable" change may happen to it as well, though much less likely.
It should at least be considered as a possible option/alternative.
About the EDIT
If I change the value of a property of the item I'm working with while
inside the Parallel.Foreach (in this case, a string value that gets
written to a database inside the loop), will that have any affect on
the rest of the loop iterations or just the one I'm on?
If you change a property and that object is not used anywhere else, and it doesn't rely on some global mutable state (for example, sort of a public static Int32 ChangesCount that increments with each state change), then you should be safe.
a string value that gets written to a database inside the loop - depending on the used data access technology and how you use it, you may be in trouble, because most of them are not designed for multithreaded environment, like EF DbContext, for example. And obviously do not forget that dealing with concurrent access in database is not always easy, though that is a bit away from our original theme.
Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case - if there is no risk of external concurrent changes, then it is just an unnecessary work. And if there is a chance of another threads(not Parallel.For) making changes to those objects that are being persisted, then you already have bigger problems than Parallel.For.
Objects should always have observable consistent state (unlike when half of properties set by one thread, and half by another, while you try to persist that who-knows-what), and if they are used by many threads, then they should be already thread-safe - there should be no way to put them into inconsistent state.
And if they want to be persisted by external code, such objects should probably provide:
Either SyncRoot property to synchronize property reading code.
Or some current state snapshot DTO that is created internally by some thread-safe method like ThingSnapshot Thing.GetCurrentData() { lock() {} }.
Or something more exotic.
I am writing a web service that allows users to create jobs within the system. Each user has an allowance of the number of jobs they can create. I have a method which checks that the user has some remaining credits which looks like this:
private bool CheckRemainingCreditsForUser(string userId)
{
lock(lockObj)
{
var user = GetUserFromDB(userId);
if (user.RemaingCredit == 0) return false;
RemoveOneCreditFromUser(user);
SaveUserToDB(user);
}
}
The problem I can see with this approach is that if multiple different users make a request at the same time they will only get processed one at a time which could cause performance issues to the client. Would it be possible to do something like this?
private bool CheckRemainingCreditsForUser(string userId)
{
//If there is a current lock on the value of userId then wait
//If not get a lock on the value of userId
var user = GetUserFromDB(userId);
if (user.RemaingCredit == 0) return false;
RemoveOneCreditFromUser(user);
SaveUserToDB(user);
//Release lock on the value of userId
}
This would mean that requests with different userIds could be processed at the same time, but requests with the same userId would have to wait for the previous request to finish
Yes, you could do that with a Dictionary<string, object>. To link a lockObject to every userId.
The problem would be cleaning up that Dictionary every so often.
But I would verify first that there really is a bottleneck here. Don't fix problems you don't have.
The alternative is to have a (optimistic) concurrency check in your db and just handle the (rare) conflict cases.
Instead of locking in every methods, why aren't you using a Singleton that will manage the User's rights ?
It will be responsible from giving the remaining allowances AND manage them at the same time without loosing the thread-safe code.
By the way, a method named CheckRemainingCreditsForUser should not remove allowances since the name isn't implying it, you may be the only developer on this project but it won't hurt to make 2 methods to manage this for re-useability and code comprehension.
EDIT : And this object should also hold the Users dictionary
We currently have a production application that runs as a windows service. Many times this application will end up in a loop that can take several hours to complete. We are using Entity Framework for .net 4.0 for our data access.
I'm looking for confirmation that if we load new data into the system, after this loop is initialized, it will not result in items being added to the loop itself. When the loop is initialized we are looking for data "as of" that moment. Although I'm relatively certain that this will work exactly like using ADO and doing a loop on the data (the loop only cycles through data that was present at the time of initialization), I am looking for confirmation for co-workers.
Thanks in advance for your help.
//update : here's some sample code in c# - question is the same, will the enumeration change if new items are added to the table that EF is querying?
IEnumerable<myobject> myobjects = (from o in db.theobjects where o.id==myID select o);
foreach (myobject obj in myobjects)
{
//perform action on obj here
}
It depends on your precise implementation.
Once a query has been executed against the database then the results of the query will not change (assuming you aren't using lazy loading). To ensure this you can dispose of the context after retrieving query results--this effectively "cuts the cord" between the retrieved data and that database.
Lazy loading can result in a mix of "initial" and "new" data; however once the data has been retrieved it will become a fixed snapshot and not susceptible to updates.
You mention this is a long running process; which implies that there may be a very large amount of data involved. If you aren't able to fully retrieve all data to be processed (due to memory limitations, or other bottlenecks) then you likely can't ensure that you are working against the original data. The results are not fixed until a query is executed, and any updates prior to query execution will appear in results.
I think your best bet is to change the logic of your application such that when the "loop" logic is determining whether it should do another interation or exit you take the opportunity to load the newly added items to the list. see pseudo code below:
var repo = new Repository();
while (repo.HasMoreItemsToProcess())
{
var entity = repo.GetNextItem();
}
Let me know if this makes sense.
The easiest way to assure that this happens - if the data itself isn't too big - is to convert the data you retrieve from the database to a List<>, e.g., something like this (pulled at random from my current project):
var sessionIds = room.Sessions.Select(s => s.SessionId).ToList();
And then iterate through the list, not through the IEnumerable<> that would otherwise be returned. Converting it to a list triggers the enumeration, and then throws all the results into memory.
If there's too much data to fit into memory, and you need to stick with an IEnumerable<>, then the answer to your question depends on various database and connection settings.
I'd take a snapshot of ID's to be processed -- quickly and as a transaction -- then work that list in the fashion you're doing today.
In addition to accomplishing the goal of not changing the sample mid-stream, this also gives you the ability to extend your solution to track status on each item as it's processed. For a long-running process, this can be very helpful for progress reporting restart / retry capabilities, etc.
I am writing a Console Application in C# in which I want to cache certain items for a predefined time (let's say 1 hour). I want items that have been added into this cache to be automatically removed after they expire. Is there a built-in data structure that I can use? Remember this is a Console App not a web app.
Do you actually need them removed from the cache at that time? Or just that future requests to the cache for that item should return null after a given time?
To do the former, you would need some sort of background thread that was periodically purging the cache. This would only be needed if you were worried about memory consumption or something. If you just want the data to expire, that would be easy to do.
It is trivial to create such a class.
class CachedObject<TValue>
{
DateTime Date{get;set;}
TimeSpan Duration{get;set;}
TValue Cached{get;set;}
}
class Cache : Dictionary<TKey,TValue>
{
public new TValue this(TKey key)
{
get{
if (ContainsKey(key))
{
var val = base.this[key];
//compare dates
//if expired, remove from cache, return null
//else return the cached item.
}
}
set{//create new CachedObject, set date and timespan, set value, add to dictionary}
}
Its already in the BCL. Its just not where you expect to find it: You can use System.Web.Caching from other kinds of applications too, not only in ASP.NET.
This search on google links to several resources about this.
I don't know of any objects in the BCL which do this, but I have written similar things before.
You can do this fairly easily by just including a System.Threading.Timer inside of your caching class (no web/winforms dependencies), and storing an expiration (or last used) time on your objects. Just have the timer check every few minutes, and remove the objects you want to expire.
However, be watchful of events on your objects. I had a system like this, and was not being very careful to unsubscribe from events on my objects in the cache, which was preventing a subtle, but nasty memeory leak over time. This can be very tricky to debug.
Include an ExpirationDate property in the object that you will be caching (probably a wrapper around your real object) and set it to expire in an hour in its constructor. Instead of removing items from the collection, access the collection through a method that filters out the expired items. Or create a custom collection that does this automatically. If you need to actually remove items from the cache, your custom collection could instead purge expired items on every call to one of its members.