Asynchronously updating a graph?

Asynchronously updating a graph? - c#

I am playing around with an idea in C#, and would like some advice on the best way to go about asynchronously updating a large number of nodes in a graph. I haven't read anything about how to do things like that, everything I've seen in textbooks / examples use graphs whose nodes don't really change.
Suppose I have a graph of some large number of nodes (thousands). Each node has some internal state that depends on some public properties of each of its neighbors, as well as potentially some external input.
So schematically a node is simply:
class Node
{
State internalState;
public State exposedState;
Input input;
List<Node> neigbors;
void Update()
{
while (true)
{
DoCalculations(input, internalState, neighbors);
exposedState = ExposedState(internalState);
}
}
State ExposedState(State state) { ... }
void DoCalculations() { ... }
}
The difficulty is that I would like nodes to be updated as soon as either their their input state changes (by subscribing to an event or polling) or their neighbor's state changes. If I try to do this synchronously in the naive way, I have the obvious problem:
Node A updates when input changes
Its neighbor B sees A has changed, updates.
Node A sees its neighbor B has changed, updates
B updates
A updates
....
Stack overflows
If I update by instead, enumerating through all nodes and calling their update methods, nodes may be inconsistently updated (e.g.: A's input changes, B updates and doesn't see A's change, A updates and changes exposed state).
I could update by trying to maintain a stack of nodes who want to be updated first, but then their neighbors need to be updated next, and theirs next, etc, which means each update cycle I would need to carefully walk the graph and determine the right update order, which could be very slow...
The naive asynchronous way is to have each node in its own thread (or more simply, an initial asynchronous method call happens to each node's update method, which updates indefinitely in a while(true){...}). The problem with his is that having thousands of threads does not seem like a good idea!
It seems like this should have a simple solution; this isn't too different from cellular automata, but any synchronous solution I come up with either has to update a large number of times compared to the number of nodes to get a message from one end to the other, or solving some kind of complicated graph-walking problem with multiple starting points.
The async method seems trivially simple, if only I could have thousands of threads...
So what is the best way to go about doing something like this?

I would think that Rx (The Reactive Extensions) would be a good starting point.
Each piece of state that other nodes might need to depend on is exposed as an IObserable<TState> and other nodes can then subscribe to those observables:
otherNode.PieceOfState.SubScribe(v => { UpdateMyState(v) });
Rx provides lots of filtering and processing functions for observables: these can be used to filter duplicate events (but you'll need to define "duplicate" of course).
Here's an introductory article: http://weblogs.asp.net/podwysocki/archive/2009/10/14/introducing-the-reactive-framework-part-i.aspx

First you need to make sure your updates converge. This has nothing to do with how you perform them (synchronously, asynchronously, serially or in parallel).
Suppose you have two nodes A and B, that are connection. A changes, triggering a recalculation of B. B then changes, triggering a recalculation of A. If the recalculation of A changes A's value, it will trigger a recalculation of B and so on. You need this sequence of triggers to stop at one point - you need your changes to converge. If they don't, no technique you use can fix it.
Once you are sure the calculations converge and you don't get into endless recalculations you should start with the simple single-threaded synchronous calculation and see if it performs well. If it's fast enough, stop there. If not, you can try to parallelize it.
I wouldn't create a thread per calculation, it doesn't scale at all. Instead use a queue of the calculations that need to be performed, and each time you change the value of node A, put all its neighbors in the queue. You can have a few threads processing the queue, making it a much more scalable architecture.
If this still isn't fast enough, you'll need to optimize what you put in the queue and how you handle it. I think it's way too early to worry about that now.

Related

Is there a way to lock a concurrent dictionary from being used

I have this static class
static class LocationMemoryCache
{
public static readonly ConcurrentDictionary<int, LocationCityContract> LocationCities = new();
}
My process
Api starts and initializes an empty dictionary
A background job starts and runs once every day to reload the dictionary from the database
Requests come in to read from the dictionary or update a specific city in the dictionary
My problem
If a request comes in to update the city
I update the database
If the update was successful, update the city object in the dictionary
At the same time, the background job started and queried all cities before I updated the specific city
The request finishes and the dictionary city now has the old values because the background job finished last
My solution I thought about first
Is there a way to lock/reserve the concurrent dictionary from reads/writes and then release it when I am done?
This way when the background job starts, it can lock/reserve the dictionary only for itself and when it's done it will release it for other requests to be used.
Then a request might have been waiting for the dictionary to be released and update it with the latest values.
Any ideas on other possible solutions?
Edit
What is the purpose of the background job?
If I manually update/delete something in the database I want those changes to show up after the background job runs again. This could take a day for the changes to show up and I am okay with that.
What happens when the Api wants to access the cache but its not loaded?
When the Api starts I block requests to this particular "Location" project until the background job marks IsReady to true. The cache I implemented is thread safe until I add the background job.
How much time does it take to reload the cache?
I would say less then 10 seconds for a total of 310,000+ records in the "Location" project.
Why I chose the answer
I chose Xerillio's answer because it solves the background job problem by keeping track of date times. Similar to a "object version" approach. I won't be taking this path as I have decided that if I do a manual update in the database, I might as well create an API route that does it for me so that I can update the db and cache at the same time. So I might remove the background job after all or just run it once a week. Thank you for all the answers and I am ok with a possible data inconsistency with the way I am updating the objects because if one route updates 2 specific values and another route updates 2 different specific values then the possibility of having a problem is very minimal
Edit 2
Let's imagine I have this cache now and 10,000 active users
static class LocationMemoryCache
{
public static readonly ConcurrentDictionary<int, LocationCityUserLogContract> LocationCityUserLogs = new();
}
Things I took into consideration
An update will only happen to objects that the user owns and the rate at which the user might update those objects is most likely once every minute. So that reduces the possibility of a problem by a lot for this specific example.
Most of my cache objects are related only to a specific user so it relates with bullet point 1.
The application owns the data, I don't. So I should never manually update the database unless it's critical.
Memory might be a problem but 1,000,000 normalish objects is somewhere between 80MB - 150MB. I can have a lot of objects in memory to gain performance and reduce the load on the database.
Having a lot of objects in memory will put pressure on Garbage Collection and that is not good but I don't think its bad at all for me because Garbage Collection only runs when memory gets low and all I have to do is just plan ahead to make sure there is enough memory. Yes it will run because of day to day operations but it won't be a big impact.
All of these considerations just so that I can have an in memory cache right at my finger tips.

I would suggest adding a UpdatedAt/CreatedAt property to your LocationCityContract or creating a wrapper object (CacheItem<LocationCityContract>) with such a property. That way you can check if the item you're about to add/update with is newer than the existing object like so:
public class CacheItem<T>
{
public T Item { get; }
public DateTime CreatedAt { get; }
// In case of system clock synchronization, consider making CreatedAt
// a long and using Environment.TickCount64. See comment from #Theodor
public CacheItem(T item, DateTime? createdAt = null)
{
Item = item;
CreatedAt = createdAt ?? DateTime.UtcNow;
}
}
// Use it like...
static class LocationMemoryCache
{
public static readonly
ConcurrentDictionary<int, CacheItem<LocationCityContract>> LocationCities = new();
}
// From some request...
var newItem = new CacheItem(newLocation);
// or the background job...
var newItem = new CacheItem(newLocation, updateStart);
LocationMemoryCache.LocationCities
.AddOrUpdate(
newLocation.Id,
newItem,
(_, existingItem) =>
newItem.CreatedAt > existingItem.CreatedAt
? newItem
: existingItem)
);
When a request wants to update the cache entry they do as above with the timestamp of whenever they finished adding the item to the database (see notes below).
The background job should, as soon as it starts, save a timestamp (let's call it updateStart). It then reads everything from the database and adds the items to the cache like above, where CreatedAt for the newLocation is set to updateStart. This way, the background job only updates the cache items that haven't been updated since it started. Perhaps you're not reading all items from DB as the first thing in the background job, but instead you read them one at a time and update the cache accordingly. In that case updateStart should instead be set right before reading each value (we could call it itemReadStart instead).
Since the way of updating the item in the cache is a little more cumbersome and you might be doing it from a lot of places, you could make a helper method to make the call to LocationCities.AddOrUpdate a little easier.
Note:
Since this approach is not synchronizing (locking) updates to the database, there's a race condition that means you might end up with a slightly out-of-date item in the cache. This can happen if two requests wants to update the same item simultaneously. You can't know for sure which one updated the DB last, so even if you set CreatedAt to the timestamp after updating each, it might not truly reflect which one was updated last. Since you're ok with a 24 hour delay from manually updating the DB until the background job updates the cache, perhaps this race condition is not a problem for you as the background job will fix it when run.
As #Theodor mentioned in the comments, you should avoid updating the object from the cache directly. Either use the C# 9 record type (as opposed to a class type) or clone the object if you want to cache new updates. That means, don't use LocationMemoryCache[locationId].Item.CityName = updatedName. Instead you should e.g. clone it like:
// You need to implement a constructor or similar to clone the object
// depending on how complex it is
var newLoc = new LocationCityContract(LocationMemoryCache[locationId].Item);
newLoc.CityName = updatedName;
var newItem = new CacheItem(newLoc);
LocationMemoryCache.LocationCities
.AddOrUpdate(...); /* <- like above */
By not locking the whole dictionary you avoid having requests being blocked by each other because they're trying to update the cache at the same time. If the first point is not acceptable you can also introduce locking based on the location ID (or whatever you call it) when updating the database, so that DB and cache are updated atomically. This avoids blocking requests that are trying to update other locations so you minimize the risk of requests affecting each other.

No, there is no way to lock a ConcurrentDictionary on demand from reads/writes, and then release it when you are done. This class does not offer this functionality. You could manually use a lock every time you are accessing the ConcurrentDictionary, but by doing so you would lose all the advantages that this specialized class has to offer (low contention under heavy usage), while keeping all its disadvantages (awkward API, overhead, allocations).
My suggestion is to use a normal Dictionary protected with a lock. This is a pessimistic approach that will result occasionally to some threads unnecessarily blocked, but it is also very simple and easy to reason about its correctness. Essentially all access to the dictionary and the database will be serialized:
Every time a thread wants to read an object stored in the dictionary, will first have to take the lock, and keep the lock until it's done reading the object.
Every time a thread wants to update the database and then the corresponding object, will first have to take the lock (before even updating the database), and keep the lock until all the properties of the object have been updated.
Every time the background job wants to replace the current dictionary with a new dictionary, will first have to take the lock (before even querying the database), and keep the lock until the new dictionary has taken the place of the old one.
In case the performance of this simple approach proves to be unacceptable, you should look at more sophisticated solutions. But the complexity gap between this solution and the next simplest solution (that also offers guaranteed correctness) is likely to be quite significant, so you'd better have good reasons before going that route.

Pattern for handling execution order of multiple callbacks

I'm developing a role-playing game in C# (Unity) with a Lua scripting front-end for game logic and modding. I have a design question I've been thinking about and can't seem to find the answer. I'm implementing an Effect class, which provides a common interface to define and handle effects that affect champions or creatures, whether due to a spell, an enchanted item, or a condition (paralyzed, afraid...). The objective is to be as flexible as possible and decouple effects code from the actual champion components/classes.
I want the effect to have access to callbacks, so that it can alter what happens to the entity. If the character health changes for example, active effects can kick in and change that change before it's applied. Here are two examples in Lua, the API should be self-explanatory:
Ring of Health Loss Halving:
onHealthAdjustment = function(entity, val)
if val < 0 then val = math.floor(val / 2); end
return val;
end
Mana Shield spell:
onHealthAdjustment = function(entity, val)
if val < 0 then
championProperties = entity.championProperties;
if championProperties then
championProperties:adjustMana(val);
end
return 0;
else
return val;
end
end
That's fine, but how to handle execution order of callbacks?
Let's say the champion loses 10 health. If the ring gets processed first, it lowers that to 5, then the spell reduces health loss to 0 and removes 5 mana instead.
If the spell gets processed first, it reduces health loss to 0, removes 10 mana, and then the ring callback gets a 0 and does nothing.
I can add an effect priority variable, but there would always end up some with the same value. Process in last-applied first order, or process last-applied only leads to stupid exploits with for example picking and clicking back items in the inventory to make sure what order the effects are processed... I don't see a way to call callbacks in parallel instead of sequentially...
I'm either not seeing an obvious way to fix the current pattern, or I need to change to another design pattern. I've read about Strategy and Observer patterns but can't seem to find a clear answer. How are these cases usually handled?
Thanks!

there would always end up some with the same value
So? If you get a collision, fix it. The order in which the effects are applied is not arbitrary, it's part of your design
Presumably in your code you have a list of event handlers per event type which you iterate through when the event happens. Just make sure that list is in the right order (e.g. by controlling the order they are registered) and you're done.
Side note. In case you didn't know, this:
onHealthAdjustment = function(entity, val) end
Can be written like this:
function onHealthAdjustment(entity, val) end

Correct approach and scenario for TPL-dataflow?

I have a scenario that I've tried to solve with TPL. The result is decent but I'm wondering if my design is flawed and if there is room for any improvement or radical changes.
Scenario:
An user can "subscribe" to X number of items and can set a given interval for updates of that item. The user will then get notifications if the item has changed its data. Since time is
a vital factor I want to show an item as updated straight away instead of waiting for all items to be updated and then notify the user about all updated items in a batch, or is this a bad idea?
My approach:
A user subscribes to an event called ItemUpdated.
A method, called Process is called each time with the given interval. The method is called in a fire and forget way by creating running it on a BackgroundWorker. The Process
method works in the following way:
2.1 Retrieve JSON strings and post them to a BufferBlock which is linked to a TransformBlock.
2.2 The TransformBlock parses each JSON string into a domain object called
Item. The TransformBlock is linked to an ActionBlock.
2.3 The ActionBlock invokes the event ItemUpdated for each Item it receives.
My question is basically: Is this an optimal solution or should i re-think my strategy? My biggest concern is that I notify the user about updated items with an event. Should I use an async callback method that will be given a list of all updated items instead or is there an alternative solution?

C# detect if calls were in the same UI action

I have some nice, working edit-undo functionality in my winforms application. It works using a CommandStack class, which is two Stack<IStateCommand>s (one for undo, one for redo). Each command has an Execute and an Undo method, and the CommandStack object itself has an event that is fired when the stacks are changed.
The CommandStack also works out if the LogCommand method is called from its own Undo function, and therefore adding it to the redo stack, rather than the undo stack. This is done by simply adding the current ManagingThreadId to a List<int> object, then removing it after the Undo command is completed (as opposed to using the stack trace, which I believe would be much slower and a bit dirty).
There is a lot of different commands within my application so this formula is sort of set in stone as it'll take me a few days to redo all those IStateCommands implementations.
The only problem with this, currently, some UI events within also call other UI events, both of which log an IStateCommand to the undo history. Is there any way in C# that I can detect if the LogCommand function has already been called from the same UI event (Click, DragDrop, SelectedIndexChanged, TextChanged, etc), then I can combine the commands into one command (using my CommandList class, which also inherits IStateCommand)?
I've thought of saving the current time when the undo event was called, then if the next command is logged less than x milliseconds later, combine them in the history, but this seems a bit sloppy. I've also considered searching the stack trace, but I don't really know what to look for to find the root UI event, nor do I know whether I would tell the different between one button click, then a different click on the same button.
It may also be helpful to know that all of these commands are being called from the UI thread from event handlers (mostly from events from custom user controls). The only part of my application that uses another thread runs after most UI events, after the undo history is logged.
Thanks!
Sort Version
The same method is being called twice from the same UI event (eg, MouseUp, DragDrop). The second time this method is called, how do I check that it has already been called once by the same UI event?
Edit: The solution (sort of)
It's a bit of a dirty one as I don't have the time to completely re-write this system. However I've implemented it in such a way that gives the option not to be so dirty in the future.
The solution is based on one of Erno's comments on his answer (so I will mark his answer as accepted), where he suggests added a parameter. I added another overload to my LogCommand(IStackCommand) method in the CommandStack class, LogCommand(IStackCommand, string). The string is the actionId, which is stored for each command, and if this string is the same as the last, the commands are combined. This gives the option to go through each event and give a unique ID.
However, the dirty part - to get it working before we have to show the client, the actionId defaults to System.Windows.Forms.Cursor.Position.ToString(), ouch!! Since the cursor position is not changed while the UI thread is executing, this combines each command. It actually even combines TextChanged commands (as long as they don't move their mouse!)

It might be an option to add a local stack of called-commands to a command.
When a command executes other commands add the command to the local stack so you can undo the commands on this local stack when the command must be undone or redone.
EDIT
I am not quite sure what you don't understand.
I would simply add a CommandList property to the StateCommand. Everytime the StateCommand invokes/triggers another StateCommand it should add the new StateCommand to the CommandList. So the global CommandList keeps track of the Commands that can be undone from the UI and each StateCommand keeps track of the StateCommands it invoked (so these are not added to the global undo CommandList)
EDIT 2
If you can't or do not want to change to that setup you would have to pass a parameter to the execution of the commands that links them together.

Did you try to inspect the method stack and analyze it method-by-method:
StackTrace st = new StackTrace();
for ( int i=0; i<st.FrameCount; i++ )
{
StackFrame sf = st.GetFrame(i);
MethodBase mb = sf.GetMethod();
// do whatever you want
}

I don't know what you need exactly to achieve, but I implemented something similar, maybe you can get some ideas...
In summary, you can store some information in a ThreadStatic variable. Then, any time you want to log a command, inspect the thread static variable to find out the context in which you are logging the command. If it's empty, you are starting a new command logging sequence. If not, you are inside a sequence.
Maybe you can store the entry event (e.g. Click, DragDrop,...), or the command itself... It depends on your needs.
When the initial event callback is completed, clean the static variable to signal that the sequence has been completed.
I successfully implemented a similar strategy to track commands executed upon an object model. I encapsulated the logic within an IDisposable class that also implemented the reference counting to handle the nested usings. The first using started the sequence, subsequents using statements increased and decreased the reference counting to know when the sequence was completed. The outermost context disposing fired an event containing all the nested commands. In my specific case it has worked perfectly, I don't know if it may fulfill your needs...

Can't get my head around implementing an Undo/Redo functionality, should I use a Stack?

I'm getting kind of confused right now, having one of those days I guess.
I need to implement an Undo and Redo functionality for a form. For simplicities sake, let's say that I only save the control which was modified and the value it had when it left Focus.
How do I save this information in a way that lets me go back or forth in the 'timeline'.
I thought about using a Stack, but while I was testing my little demo, I had a mild aneurysm and here I am.
Code needed, not really but would help. I'm more interested in the algorithm I'd need to implement. Any suggestions?

Yes, you would use a stack. There are a couple ways to do it; read these references:
http://en.wikipedia.org/wiki/Command_pattern
http://en.wikipedia.org/wiki/Memento_pattern
Each has its pros/cons.

A stack is perfect if you push a "change" onto it, and when undo pop a "change" from it. You then push that popped change into another stack representing redo. At some point in the future, hopefully on save, you clear both stacks.
It's not actually as simple as that, as you need to record the type of change, understand the old and new values etc. So when you pop from the undo stack, the thing you pop must describe what the prior value was and what control it was set to.
Inverse for the redo stack, it needs to understand what the new value was and where it went. But yes, the idea of two stacks is a good start for a homebrew undo-redo.
A good example of a business object based undo is CSLA.NET, which has UndoableBase:
http://www.lhotka.net/cslanet/
http://www.koders.com/csharp/fidCF6AB2CF035B830FF6E40AA22C8AE7B135BE1FC0.aspx?s=serializationinfo
However this records a snapshot of an object's state, so it would be more advanced that your form-based concept. However, CSLA.NET offers full data binding support so a data bound object inheriting from UndoableBase would naturally support undo (not redo) in the UI.

I would use an IUndoableAction interface. The implementations could store whatever data they needed to be done and undone. Then yes, I would use a Stack to hold them.
interface IUndoableAction
{
void Do();
void Undo();
}
Stack<IUndoableAction> Actions;
Each kind of action would implement the Do and Undo methods.
Then, somewhere there would be these two methods:
void PerformAction(IUndoableActionaction)
{
Actions.Push(action);
action.Do();
}
void Undo()
{
var action = Actions.Pop();
action.Undo();
}
As for what to store in the action classes, some actions could just store the old value. However, once I had an action to swap two rows in a spreadsheet. I didn't store the values of every cell in both rows -- I just stored the row indices so they could be swapped back. It could be easy to fill up tons of memory if you stored all of that state for every action.
Then you want a Redo stack as well, and when you undo an action it is pushed onto the redo stack. The redo stack will need to be cleared when a new action is performed, so things don't get out of order.

Probably the most straightforward is to have the undo/redo stack combination.
An alternative is to have an array or list of actions, and just increment/decrement a pointer to an index in the array. When an action is undone, the index is moved back by one, and when the action is redone, the index is moved forward by one.
The advantage here is that you don't require a pop-and-then-push sequence for each action.
Things to consider:
If you undo several times, and then perform an action, all of the
redo actions must be eliminated.
Make sure you check the boundaries and ensure that there is an action available to undo/redo before trying to perform the undo/redo.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.