I read this article about working with Reliable Collections and it is mentioned there you MUST not modify an object once you have given it to a reliable collection and the correct way to update a value in a reliable collection, is to get a copy (clone) of the value, checnge the cloned value and then update the cloned value in the RC.
Bad use:
using (ITransaction tx = StateManager.CreateTransaction()) {
// Use the user’s name to look up their data
ConditionalValue<User> user =
await m_dic.TryGetValueAsync(tx, name);
// The user exists in the dictionary, update one of their properties.
if (user.HasValue) {
// The line below updates the property’s value in memory only; the
// new value is NOT serialized, logged, & sent to secondary replicas.
user.Value.LastLogin = DateTime.UtcNow; // Corruption!
await tx.CommitAsync();
}
}
My Quesion is: why can't I modify the the object once I gave it to the RC? why do I have to clone the object before I change something in it? why can't I do something like (update the object in the same transaction):
using (ITransaction tx = StateManager.CreateTransaction()) {
// Use the user’s name to look up their data
ConditionalValue<User> user =
await m_dic.TryGetValueAsync(tx, name);
// The user exists in the dictionary, update one of their properties.
if (user.HasValue) {
// The line below updates the property’s value in memory only; the
// new value is NOT serialized, logged, & sent to secondary replicas.
user.Value.LastLogin = DateTime.UtcNow;
// Update
await m_dic.SetValue(tx, name, user.Value);
await tx.CommitAsync();
}
}
Thanks!
Reliable Dictionary is a replicated object store. If you update the objects inside Reliable Dictionary without going through Reliable Dictionary (e.g. TryUpdateAsync), then you can corrupt the state.
For example, if you change the object inside Reliable Dictionary using your reference, then the change will not be replicated to the secondary replicas.
This is because Reliable Dictionary does not know that you changed one of the TValues. Hence, the change will be lost if the replica ever fails over.
Above is the most simple example. Modifying objects directly can cause other serious problems like breaking ACID in multiple ways.
Technically you can do what you want. But don't forget about lock modes and isolation levels.
Here we can read: “Any Repeatable Read operation by default takes Shared locks. However, for any read operation that supports Repeatable Read, the user can ask for an Update lock instead of the Shared lock”.
That means that TryGetValueAsync gets only Shared lock. And attempt to update this value later could cause a dedlock.
The next statement is: “An Update lock is an asymmetric lock used to prevent a common form of deadlock that occurs when multiple transactions lock resources for potential updates at a later time.”
So, the correct code would be
await m_dic.TryGetValueAsync(tx, name, LockMode.Update)
Related
I have this static class
static class LocationMemoryCache
{
public static readonly ConcurrentDictionary<int, LocationCityContract> LocationCities = new();
}
My process
Api starts and initializes an empty dictionary
A background job starts and runs once every day to reload the dictionary from the database
Requests come in to read from the dictionary or update a specific city in the dictionary
My problem
If a request comes in to update the city
I update the database
If the update was successful, update the city object in the dictionary
At the same time, the background job started and queried all cities before I updated the specific city
The request finishes and the dictionary city now has the old values because the background job finished last
My solution I thought about first
Is there a way to lock/reserve the concurrent dictionary from reads/writes and then release it when I am done?
This way when the background job starts, it can lock/reserve the dictionary only for itself and when it's done it will release it for other requests to be used.
Then a request might have been waiting for the dictionary to be released and update it with the latest values.
Any ideas on other possible solutions?
Edit
What is the purpose of the background job?
If I manually update/delete something in the database I want those changes to show up after the background job runs again. This could take a day for the changes to show up and I am okay with that.
What happens when the Api wants to access the cache but its not loaded?
When the Api starts I block requests to this particular "Location" project until the background job marks IsReady to true. The cache I implemented is thread safe until I add the background job.
How much time does it take to reload the cache?
I would say less then 10 seconds for a total of 310,000+ records in the "Location" project.
Why I chose the answer
I chose Xerillio's answer because it solves the background job problem by keeping track of date times. Similar to a "object version" approach. I won't be taking this path as I have decided that if I do a manual update in the database, I might as well create an API route that does it for me so that I can update the db and cache at the same time. So I might remove the background job after all or just run it once a week. Thank you for all the answers and I am ok with a possible data inconsistency with the way I am updating the objects because if one route updates 2 specific values and another route updates 2 different specific values then the possibility of having a problem is very minimal
Edit 2
Let's imagine I have this cache now and 10,000 active users
static class LocationMemoryCache
{
public static readonly ConcurrentDictionary<int, LocationCityUserLogContract> LocationCityUserLogs = new();
}
Things I took into consideration
An update will only happen to objects that the user owns and the rate at which the user might update those objects is most likely once every minute. So that reduces the possibility of a problem by a lot for this specific example.
Most of my cache objects are related only to a specific user so it relates with bullet point 1.
The application owns the data, I don't. So I should never manually update the database unless it's critical.
Memory might be a problem but 1,000,000 normalish objects is somewhere between 80MB - 150MB. I can have a lot of objects in memory to gain performance and reduce the load on the database.
Having a lot of objects in memory will put pressure on Garbage Collection and that is not good but I don't think its bad at all for me because Garbage Collection only runs when memory gets low and all I have to do is just plan ahead to make sure there is enough memory. Yes it will run because of day to day operations but it won't be a big impact.
All of these considerations just so that I can have an in memory cache right at my finger tips.
I would suggest adding a UpdatedAt/CreatedAt property to your LocationCityContract or creating a wrapper object (CacheItem<LocationCityContract>) with such a property. That way you can check if the item you're about to add/update with is newer than the existing object like so:
public class CacheItem<T>
{
public T Item { get; }
public DateTime CreatedAt { get; }
// In case of system clock synchronization, consider making CreatedAt
// a long and using Environment.TickCount64. See comment from #Theodor
public CacheItem(T item, DateTime? createdAt = null)
{
Item = item;
CreatedAt = createdAt ?? DateTime.UtcNow;
}
}
// Use it like...
static class LocationMemoryCache
{
public static readonly
ConcurrentDictionary<int, CacheItem<LocationCityContract>> LocationCities = new();
}
// From some request...
var newItem = new CacheItem(newLocation);
// or the background job...
var newItem = new CacheItem(newLocation, updateStart);
LocationMemoryCache.LocationCities
.AddOrUpdate(
newLocation.Id,
newItem,
(_, existingItem) =>
newItem.CreatedAt > existingItem.CreatedAt
? newItem
: existingItem)
);
When a request wants to update the cache entry they do as above with the timestamp of whenever they finished adding the item to the database (see notes below).
The background job should, as soon as it starts, save a timestamp (let's call it updateStart). It then reads everything from the database and adds the items to the cache like above, where CreatedAt for the newLocation is set to updateStart. This way, the background job only updates the cache items that haven't been updated since it started. Perhaps you're not reading all items from DB as the first thing in the background job, but instead you read them one at a time and update the cache accordingly. In that case updateStart should instead be set right before reading each value (we could call it itemReadStart instead).
Since the way of updating the item in the cache is a little more cumbersome and you might be doing it from a lot of places, you could make a helper method to make the call to LocationCities.AddOrUpdate a little easier.
Note:
Since this approach is not synchronizing (locking) updates to the database, there's a race condition that means you might end up with a slightly out-of-date item in the cache. This can happen if two requests wants to update the same item simultaneously. You can't know for sure which one updated the DB last, so even if you set CreatedAt to the timestamp after updating each, it might not truly reflect which one was updated last. Since you're ok with a 24 hour delay from manually updating the DB until the background job updates the cache, perhaps this race condition is not a problem for you as the background job will fix it when run.
As #Theodor mentioned in the comments, you should avoid updating the object from the cache directly. Either use the C# 9 record type (as opposed to a class type) or clone the object if you want to cache new updates. That means, don't use LocationMemoryCache[locationId].Item.CityName = updatedName. Instead you should e.g. clone it like:
// You need to implement a constructor or similar to clone the object
// depending on how complex it is
var newLoc = new LocationCityContract(LocationMemoryCache[locationId].Item);
newLoc.CityName = updatedName;
var newItem = new CacheItem(newLoc);
LocationMemoryCache.LocationCities
.AddOrUpdate(...); /* <- like above */
By not locking the whole dictionary you avoid having requests being blocked by each other because they're trying to update the cache at the same time. If the first point is not acceptable you can also introduce locking based on the location ID (or whatever you call it) when updating the database, so that DB and cache are updated atomically. This avoids blocking requests that are trying to update other locations so you minimize the risk of requests affecting each other.
No, there is no way to lock a ConcurrentDictionary on demand from reads/writes, and then release it when you are done. This class does not offer this functionality. You could manually use a lock every time you are accessing the ConcurrentDictionary, but by doing so you would lose all the advantages that this specialized class has to offer (low contention under heavy usage), while keeping all its disadvantages (awkward API, overhead, allocations).
My suggestion is to use a normal Dictionary protected with a lock. This is a pessimistic approach that will result occasionally to some threads unnecessarily blocked, but it is also very simple and easy to reason about its correctness. Essentially all access to the dictionary and the database will be serialized:
Every time a thread wants to read an object stored in the dictionary, will first have to take the lock, and keep the lock until it's done reading the object.
Every time a thread wants to update the database and then the corresponding object, will first have to take the lock (before even updating the database), and keep the lock until all the properties of the object have been updated.
Every time the background job wants to replace the current dictionary with a new dictionary, will first have to take the lock (before even querying the database), and keep the lock until the new dictionary has taken the place of the old one.
In case the performance of this simple approach proves to be unacceptable, you should look at more sophisticated solutions. But the complexity gap between this solution and the next simplest solution (that also offers guaranteed correctness) is likely to be quite significant, so you'd better have good reasons before going that route.
I have ASP.NET Core application which communicates with web api.
There is an business operation involving a few steps (do sth on one page, go to next and next). This multi-step operation is being done in context of one element.
So let's say you have a list of some business objects and your task is to accept object 3 from this list. Accepting is our multistep operation and if I am currently accepting object 3 no one else should be able to enter accepting operation for object 3. When I finish operation, it should be unlocked.
Hope the problem is understandable.
We don't want very time-consuming solution, the simplest idea was to create a database table which indicates when user starts operation, it saves id of the object and id of the user and automatcally remove itself after for example 5 minutes, if someone else want to access operation we check if it is blocked for this object. But it is kind of hacky and not very clean (what if user will go for a coffee and continue operation after 10 minutes?)
I'm looking for a better way to implement this kind of behaviour and appreciate any ideas
If I were to implement that behavior, I'll also use database, but kinda different way. I'll make a table of objects (object 3 is one of its row), adding a column for UserId, boolean OnProcess (to mark if the object is on process or not) and timestamp for StartProcess.
For a user to be able to enter the operation, run query like:
UPDATE Objects SET UserId = <CurrentUser>, StartProcess = <NOW>, OnProcess = true
OUTPUT Object.Id
WHERE Object.Id == 3 AND
(
OnProcess == false
OR ( OnProcess == true AND UserId == <CurrentUser> )
OR ( StartProcess <is more than 15 minutes ago>)
)
disclaimer: the query above is not an executable query, but it should be clear enough to understand what it does.
With the query above, the Object.Id will be returned when:
the object is not being processed by another user
the object is being processed by the CurrentUser itself, also resetting the StartProcess (some kind of sliding behavior). This way, if CurrentUser AFK for a given time (but not exceeding the threshold time) and comes back, he/she can comfortably continue the operation
the object is not being processed for the last 15 minutes. This is actually the threshold that I mention in previous point. As for how long (15 minutes in my example), it is really up to you.
If the Object.Id is returned for a user, then he/she are able to enter the operation.
You're looking for a semaphore. The lock keyword is the most basic of semaphores, but you can also use Semaphore/SemaphoreSlim, which provide the ability to do things like rate-limiting, whereas lock will literally gate one op at a time. However, your goal is to gate one op at a time, for a particular resource, which makes SemaphoreSlim a better choice, specifically a ConcurrentDictionary<string, SemaphoreSlim>.
You'll need a class with singleton lifetime (one instance for the entire life of the application). There, you'll add an ivar:
private readonly ConcurrentDictionary<string, SemaphoreSlim> _semaphores = new ConcurrentDictionary<string, SemaphoreSlim>();
Then, you'll add the following code around the operation you want to gate:
var semaphore = _semaphores.GetOrAdd("object3", _ => new SemaphoreSlim(1, 1));
await semaphore.WaitAsync();
// do something
semaphore.Release();
The "object3" there is obviously just a placeholder. You'll want to use whatever makes sense (ID, etc.) - something that uniquely identifies the particular resource you're gating. This then will only hold operations for that particular resource if there's an existing operation on that particular resource. A different resource would get its own semaphore and thus its own gate.
I have been using parameter passing with page Navigate method in UWP app without a problem (with object serialization and deserialization). As my objects (passed as parameters) grew in size, I started hitting a problem when the app is suspended and the SuspensionManager attempts to serialize the navigation parameter and save it to a local storage. I get an exception indicating size limit of 8K I think (and assume I have no control over this size).
So I am considering passing the parameter via memory cache rather than navigation parameter(say, save my complex data object to a dictionary in memory with nameof(PageNavigatedToType) as key and retrieve the cached data on NavigatedTo in the destination page. My concern is a possible memory usage increase and not sure for instance if setting a particular dictionary value to null (when no more needed) makes that much of a difference to global memory (I mean app scope).
Any thoughts and suggestions appreciated.
You can't use LocalSettings to store a value higher than 8k and each composite setting can't be higher than 64K bytes in size.
ApplicationData.LocalSettings | localSettings property
If you need to much data to restore when SuspensionManager handle the app restoration maybe a better idea is just to save a value as key and then restore the full object from another place or procedure. As you have suggested.
I am considering passing the parameter via memory cache rather than
navigation parameter(say, save my complex data object to a dictionary
in memory with nameof(PageNavigatedToType) as key and retrieve the
cached data on NavigatedTo in the destination page.
Your concern
My concern is a possible memory usage increase and not sure for
instance if setting a particular dictionary value to null (when no
more needed) makes that much of a difference to global memory (I mean
app scope).
Is a legit concern, but you can handle the object scope to be short, once the GC "detects" memory needings It'll free the objects according your object scope into the code.
If you free a dictionary value -setting it to null- could be enough to be recycled "if and when " there doesn't exists more references to such object.
short object scope could be achieve it in many ways because this is a concept.
Just declaring local method variables instead of class variables/properties is one of them.
When you're using IDisposable objects. The using keyword ensures the correct use of IDisposable objects and allow them to be garbagecollect
using (Font font1 = new Font("Arial", 10.0f))
{
byte charset = font1.GdiCharSet;
}
C# allows you to control a field scope into a method thru brackets, this is a syntax helper to control in which scope your field would be used and because of that you can opt to free those resources just after close the brackets. For sure you could use that if your field isn't an IDisposable field.
public void MyMethod()
{
...
...
{
var o = new MyObject();
var otherReferenceToSameObject = o;
var s = "my string";
...
...
...
...
otherReferenceToSameObject = o = null;
s = null;
}
...
...
}
Any way remember that GC is deterministic by its own algorithms, We don't have enough control over that, but keep some control over our resources could help the GC to do a better work.
I have some code which processes a number of response objects from my database in parallel (using AsParallel()). Each response has many components. The responses may share the same components. I do some modifications to the component data and save it to the db, so I need to prevent multiple threads working on the same component object at the same time.
I use locks to achieve this. I have a ConcurrentDictionary<int, object> to hold all the necessary lock objects. Like this:
private static ConcurrentDictionary<int, object> compLocks = new ConcurrentDictionary<int, object>();
var compIds = db.components.Select(c => c.component_id).ToList();
foreach (var compId in compIds)
{
compLocks[compId] = new object();
}
Then later on I do this:
responses.AsParallel().ForAll(r =>
{
... do some time consuming stuff with web services ...
// this is a *just in case* addition,
// in case a new component was added to
// the db since the dictionary was constructed
// NOTE: it did not have any effect, and I'm no longer
// using it as #Henk pointed out it is not thread-safe.
//if (compLocks[c.component_id] == null)
//{
// compLocks[c.component_id] = new object();
//}
componentList.AsParallel().ForAll(c =>
{
lock (compLocks[c.component_id])
{
... do some processing, save the db records ...
}
});
});
This seems to run perfectly fine but towards the end of program execution (it runs for several hours as there are lots of data) I get the following exception:
Unhandled Exception: System.AggregateException: One or more errors occurred. ---> System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary.
at System.Collections.Concurrent.ConcurrentDictionary`2.get_Item(TKey key)
I am sure that the ConcurrentDictionary is being populated with every possible component ID.
I have 3 questions:
How is this exception even possible, and how do I fix it?
Do I need a ConcurrentDictionary for this?
Is my understanding of how locking works correct in this instance / is there a better way of doing this?
Post-Answer Edit
To make it clear what the cause of all this was, it's that .AsParallel() doesn't enumerate the collection of responses. It's lazy-evaluated, meaning new responses (and therefore new components) can be added to the collection during run-time (from other processes). Enforcing a snap-shot with .ToList() before the .AsParallel() fixed the problem.
My code for adding component IDs to compLocks during run-time didn't remedy this problem is because it is not thread safe.
1) How is this exception even possible?
Apparently it is, but not from the posted code alone. It would happen if data is added to the db (would it be an option to capture responses with a ToList() beforehand?)
2) Do I need a ConcurrentDictionary for this?
Not with a fixed list, but when the solution involves add-when-missing then yes, you need a Concurrent collection.
3) Is my understanding of how locking works correct in this instance / is there a better way of doing this?
Not totally sure. The locking looks OK but you will still do the processing of duplicates multiple times. Just not at the same time.
Reaction to the edit:
if (compLocks[c.component_id] == null)
{
compLocks[c.component_id] = new object();
}
this is not thread-safe. It is now possible that multiple lock objects are created for 1 component_id value. You need to use one of the GetOrAdd() methods.
But I would not expect this to give the exception you're getting, so it's probably not the direct problem.
I would start by replacing:
lock (compLocks[c.component_id])
{
...
}
by:
object compLock;
if (!compLocks.TryGetValue(c.component_id, out compLock)) Debug.Assert(false);
lock(compLock)
{
...
}
Then set it running and go and get a coffee. When the assert fails you'll be able to debug and get a better idea of what's happening.
As for your questions:
1.How is this exception even possible?
Without seeing the rest of your code, impossible to say.
2.Do I need a ConcurrentDictionary for this?
If you initialize the dictionary once from a single thread, then subsequently only ever read from the dictionary, then it doesn't need to be a ConcurrentDictionary.
3.Is my understanding of how locking works correct in this instance / is there a better way of doing this?
Again, difficuly to say without seeing more code, but I don't see anything obviously wrong with the small sample of code you've posted. But threading is hard and it's quite possible there are race conditions elsewhere in your code.
I have the following code:
var sequence = from row in CicApplication.DistributorBackpressure44Cache
where row.Coater == this.Coater && row.IsDistributorInUse
select new GenericValue
{
ReadTime = row.CoaterTime.Value,
Value = row.BackpressureLeft
};
this.EvaluateBackpressure(sequence, "BackpressureLeftTarget");
And DistributorBackpressure44Cache is defined as follows:
internal static List<DistributorBackpressure44> DistributorBackpressure44Cache
{
get
{
return _distributorBackpressure44;
}
}
This is part of a heavily threaded application where DistributorBackpressure44Cache could be being refreshed in one thread, and queried from, as shown above, in another thread. The variable 'sequence' above is an IEnumerable, which is passed to the method shown, and then potentially passed to the other methods, before actually being executed. My concern is this. What will happen with the above query if the DistributorBackpressure44Cache is being refreshed (cleared and repopulated) when the query is actually executed?
It wouldn't do any good to put a lock around this code because this query actually gets executed at some point later (unless I were to convert it to a list immediately).
If your design can tolerate it, you could ensure snapshot level isolation with this code and avoid locking altogether. However, you would need to do the following:
Make DistributorBackpressure44Cache return a ReadOnlyCollection<T> instead, this way it is explicit you shouldn't mutate this data.
Ensure that any mutations to _distributorBackpressure44 occur on a copy and result in an atomic assignment back to _distributorBackpressure44 when complete.
var cache = _distributorBackpressure44.ToList();
this.RefreshCache(cache); // this assumes you *need* to know
// about the structure of the old list
// a design where this is not required
// is preferred
_distributorBackpressure44 = cache; // some readers will have "old"
// views of the cache, but all readers
// from some time T (where T < Twrite)
// will use the same "snapshot"
You can convert it to a list immediately (might be best--)
or
You can put a lock in the get for DistributorBackpressure44 that synchs with the cache refresh lock. You might want to include a locked and unlocked accessor; use the unlocked accessor when the result is going to be used immediately, and the locked one when the accessor is going to be used in a deferred execution situation.
Note that even that won't work if the cache refresh mutates the list _distributorBackpress44, only if it just replaces the referenced list.
Without knowing more about your architecture options, you could do something like this.
lock(CicApplication.DistributorBackpressure44Cache)
{
var sequence = from row in CicApplication.DistributorBackpressure44Cache
where row.Coater == this.Coater && row.IsDistributorInUse
select new GenericValue
{
ReadTime = row.CoaterTime.Value,
Value = row.BackpressureLeft
};
}
this.EvaluateBackpressure(sequence, "BackpressureLeftTarget");
Then in the code where you do the clear/update you would have something like this.
lock(CicApplication.DistributorBackpressure44Cache)
{
var objCache = CicApplication.DistributorBackpressure44Cache
objCache.Clear();
// code to add back items here
// [...]
}
It would be cleaner to have a central class (Singleton pattern maybe?) that controls everything surrounding the cache. But I don't know how feasible this is (i.e. putting the query code into another class and passing the parameters in). In lieu of something fancier, the above solution should work as long as you consistently remember to lock() each and every time you ever read/write to this object.