How can I ensure orleans grain consistency?

How can I ensure orleans grain consistency? - c#

In erlang you can pass initial state to actor when spawning it. This way you don't need to deal with init messages which take actor to it's initial state again or messages needing init message arrived before. In orleans given the assumption of grains always exist you can not use constructors. Is there any way to pass initial state to grains, avoiding thus any init method which break consistency by needing it to be called before any other method?
When I say "take actor to it's initial state", I mean, in orleans context, call init method of specific grain activation twice. It is like overwriting state. May be you need this king of message which reset state but if you don't need it, it is a pitfall, a potential source of bugs.
I'm looking for some type of constructor, something like spawn(module, function, [initial state]) from erlang. My first attempt was look for any overload of GetGrain with the following signature: GrainFactory.GetGrain<IGrain>(id, initialState);

As #svick suggests, OnActivateAsync is the best approach for loading an initial state for a grain.
public class ExampleGrain : Orleans.Grain, IExampleGrain
{
public override Task OnActivateAsync()
{
// set initial state for grain
return base.OnActivateAsync();
}
...
This method will be called every time the grain is initialised (not just the very first time). You could use the Persistence infrastructure built into Orleans to record whether the grain had been created previously (perhaps using a boolean property on your state class) i.e.
public class ExampleGrainState : GrainState
{
public bool Initialised { get; set; }
}
[StorageProvider(ProviderName = "Storage")]
public class QuadKeyGrain : Orleans.Grain<ExampleGrainState>, IExampleGrain
{
public override async Task OnActivateAsync()
{
if (!this.State.Initialised)
{
// do initialisation
this.State.Initialised = true;
await this.WriteStateAsync();
}
await base.OnActivateAsync();
}
See this tutorial for more information on persistence:
http://dotnet.github.io/orleans/Tutorials/Declarative-Persistence.html

Grains in Orleans are always exist, so you with your approach are going to [conditionally] re-initialize the grain every time when it gets activated. Is this really what you want to be done?
Well, if you really need to initialize the specific grain to the specific state, then you can use its key (string key or string part of the key) to pass in some json. Just remember that the key has some limitations for its size.

Related

Is using property getters for initialization (to avoid having to call methods in specific order) bad practice?

Suppose I have a class that provides some data to my application. Data initially comes from database, and I provide it through some methods that handle the whole database thing and present the result as a usable class instead of raw query result. This class has to do some setup (not complex) to make sure any method called can use the database (e.g. connect to database, make sure it contains some critical info, etc). So, were I to put it in a method (say, method Init(), that would handle checking for database, connecting to it, verifying that it does contain the info), I would have to make sure that this method is called before any other method.
So, I usually find that instead of doing this:
public class DataProvider
{
private SqlController controller;
public void Init()
{
controller = new SqlController();
controller.Init();
controller.ConnectToDataBase();
CheckForCriticalInfoInDatabase();
}
public Data GetData()
{
// get data from database (not actually going to use raw queries like that, just an example)
var queryResult = sqlController.RunQuery("SELECT something FROM SOME_TABLE");
// and present it as usable class
Data usefulData = QueryResultToUsefulData(queryResult);
return usefulData;
}
...
}
and then always making sure I call Init() before GetData(), i do something like
private SqlController _controller;
private SqlController controller
{
get
{
if (_controller == null)
{
_controller = new SqlController();
_controller.Init();
_controller.ConnectToDataBase();
CheckForCriticalInfoInDatabase();
}
return controller;
}
}
So, now i can be sure that i won't use an uninitialised SqlController, and I don't have to do that same null check in every method that uses it. However, I never noticed getters being used this way in other peoples' code.
Is there some pitfall I don't see? To me it looks like it's the same as lazy initialization, with the exception being that I use it not because the initialization is heavy or long, but because I don't want to check the order in which I call methods. This question points out that it's not thread-safe (not a concern in my case, plus I imagine it could be made thread-safe with some locks) and that setting the property to null will result in unintuitive behaviour (not a concern, because I don't have a setter at all and the backing field shouldn't be touched either way).
Also, if this kind of code IS bas practice, what is the proper way to ensure that my methods don't rely on order in which they are called?

As #madreflection said in the OP comments, use a method for anything that is possibly going to be slow. Getters and setters should just be quick ways of getting and setting a value.
Connections to dbs can be slow or fail to connect so you may have catches setup to try different connection methods etc.
You could also have the checking occur in the constructor of the object, that way the object cannot be used without init() being run in a different function, saving on time tracing where an error is actually occurring.
For example if you had one function create the object, do a bunch of 'stuff' then try to use the object without running init(), then you get the error after all of the 'stuff' not where you created the object. This could lead you to think there is something wrong in whatever way you are using the object, not that it has not been initialised.

State-machine - Stateless vs. traditional if-else code, hard to grasp the benefit

I've came across recently with a dirty if-else code, so I've looked for a refactor options and found recommendation on state-machine as an elegant replacement for dirty if-else code.
But something is hard me to grasp: It looks that as client I have the responsibility to move the machine from one state to the other. Now, if there are 2 transitions options (depend on the result of work done in the current state) Do I need to use if-else also? If so, what the main benefit from that pattern? From my point of view the machine may do the transition automatically from the starting state
Before asking I've read the below, and it only strengthens my opinion:
Auto advancing state machine with Stateless
How to encapsulate .NET Stateless state machine
Statemachine that transitions to target state and fires transitions and states between?
In my example, I've an MarketPriceEvent which needs to be stored in Redis. Before stored it has to pass through validation path. The validation path states are:
Basic Validation
Comparison
Another comparison
Storing
Error auditing
The problem is that I've many decisions to make. For example: only if BasicValidation passed successfully I'd like to to move to Comparison. Now if Comparison succeeded i'd like to move to Storing, otherwise move to ErrorAuditing.
So if we're going into code:
_machine.Configure(State.Validate).PermitIf(Trigger.Validated, State.Compare1, () => isValid);
_machine.Configure(State.Compare1).OnEntry(CompareWithResource1).
PermitIf(Trigger.Compared, State.Store, () => isValid)
.PermitIf(Trigger.Compared, State.Compare2, () => !isValid);
And in my client/wrapper code I'll write:
//Stay at Validate state
var marketPriceProcessingMachine = new MarketPriceProcessingMachine();
if (marketPriceProcessingMachine.Permitted(Trigger.Validated))
marketPriceProcessingMachine.Fire(Trigger.Validated);
//else
// ...
In short, If I need to use if-else, What the benefit did I get from such State machine concept? If it's deterministic why it doesn't self move to the next state? If I'm wrong, What's the wrong?

One benefit of using a state machine is that you reduce the number of states an object can be in. I worked with someone who had 22 bool flags in a single class. There was a lot of if !(something && !somethingElse || !userClicked) …
This sort of code is hard to read, hard to debug, hard to unit test and it's more or less impossible to reason about what the state of the class really is. 22 bool flags means that the class can be in over 4 million states. Try making unit tests for that...
State machines can reduce the complexity of code, but it will almost always make the somewhat more complex at the beginning of a new project. However, in the long term I've found that the overall complexity ends up being overall lower. This is because it's easy to extend, and add more states, since the already defined states can be left alone.
What I've found over the years is that OOP and state machines are often two aspects of the same. And I've also found that OOP is hard, and difficult to get 'right'.
I think the state machine should not be visible to the outside of an object, including its triggers. You most likely want to have a public readonly state property.
I design the classes in such a way that the caller can not directly change the state, or let the caller call Fire method directly. Instead I use methods that are verbs that are actions, like Validate().
Your work flow needs conditionals, but you have some freedom of where to put them. I would suggest separating the business logic from the state machine configuration. I think this makes the state machine easier to read.
How about something like this:
namespace ConsoleApp1
{
using Stateless;
using System;
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Press Q to stop validating events");
ConsoleKeyInfo c;
do
{
var mpe = new MarketPriceEvent();
mpe.Validate();
c = Console.ReadKey();
} while (c.Key != ConsoleKey.Q);
}
}
public class MarketPriceEvent
{
public void Validate()
{
_machine.Fire(Trigger.Validate);
}
public enum State { Validate, Compare2, ErrorAuditing, Compare1, Storing }
private enum Trigger { Validate, CompareOneOk, CompareTwoOk, Error, }
private readonly StateMachine<State, Trigger> _machine;
public MarketPriceEvent()
{
_machine = new StateMachine<State, Trigger>(State.Validate);
_machine.Configure(State.Validate)
.Permit(Trigger.Validate, State.Compare1);
_machine.Configure(State.Compare1)
.OnEntry(DoEventValidation)
.Permit(Trigger.CompareOneOk, State.Compare2)
.Permit(Trigger.Error, State.ErrorAuditing);
_machine.Configure(State.Compare2)
.OnEntry(DoEventValidationAgainstResource2)
.Permit(Trigger.CompareTwoOk, State.Storing)
.Permit(Trigger.Error, State.ErrorAuditing);
_machine.Configure(State.Storing)
.OnEntry(HandleStoring);
_machine.Configure(State.ErrorAuditing)
.OnEntry(HandleError);
}
private void DoEventValidation()
{
// Business logic goes here
if (isValid())
_machine.Fire(Trigger.CompareOneOk);
else
_machine.Fire(Trigger.Error);
}
private void DoEventValidationAgainstResource2()
{
// Business logic goes here
if (isValid())
_machine.Fire(Trigger.CompareTwoOk);
else
_machine.Fire(Trigger.Error);
}
private bool isValid()
{
// Returns false every five seconds...
return (DateTime.UtcNow.Second % 5) != 0;
}
private void HandleStoring()
{
Console.WriteLine("Awesome, validation OK!");
}
private void HandleError()
{
Console.WriteLine("Oh noes, validation failed!");
}
}
}

Are Private properties of a class called within a Parallel.Foreach body Thread Safe?

I am tasked with writing a system to process result files created by a different process(which I have no control over) and and trying to modify my code to make use of Parallel.Foreach. The code works fine when just calling a foreach but I have some concerns about thread safety when using the parallel version. The base question I need answered here is "Is the way I am doing this going to guarantee thread safety?" or is this going to cause everything to go sideways on me.
I have tried to make sure all calls are to instances and have removed every static anything except the initial static void Main. It is my current understanding that this will do alot towards assuring thread safety.
I have basically the following, edited for brevity
static void Main(string[] args)
{
MyProcess process = new MyProcess();
process.DoThings();
}
And then in the actual process to do stuff I have
public class MyProcess
{
public void DoThings()
{
//Get some list of things
List<Thing> things = getThings();
Parallel.Foreach(things, item => {
//based on some criteria, take actions from MyActionClass
MyActionClass myAct = new MyActionClass(item);
string tempstring = myAct.DoOneThing();
if(somecondition)
{
MyAct.DoOtherThing();
}
...other similar calls to myAct below here
};
}
}
And over in the MyActionClass I have something like the following:
public class MyActionClass
{
private Thing _thing;
public MyActionClass(Thing item)
{
_thing = item;
}
public string DoOneThing()
{
return _thing.GetSubThings().FirstOrDefault();
}
public void DoOtherThing()
{
_thing.property1 = "Somenewvalue";
}
}
If I can explain this any better I'll try, but I think that's the basics of my needs
EDIT:
Something else I just noticed. If I change the value of a property of the item I'm working with while inside the Parallel.Foreach (in this case, a string value that gets written to a database inside the loop), will that have any affect on the rest of the loop iterations or just the one I'm on? Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case?

There is no shared mutable state between actions in the Parallel.ForEach that I can see, so it should be thread-safe, because at most one thread can touch one object at a time.
But as it has been mentioned there is nothing shared that can be seen. It doesn't mean that in the actual code you use everything is as good as it seems here.
Or that nothing will be changed by you or your coworker that will make some state both shared and mutable (in the Thing, for example), and now you start getting difficult to reproduce crashes at best or just plain wrong behaviour at worst that can be left undetected for a long time.
So, perhaps you should try to go fully immutable near threading code?
Perhaps.
Immutability is good, but it is not a silver bullet, and it is not always easy to use and implement, or that every task can be reasonably expressed through immutable objects. And even that accidental "make shared and mutable" change may happen to it as well, though much less likely.
It should at least be considered as a possible option/alternative.
About the EDIT
If I change the value of a property of the item I'm working with while
inside the Parallel.Foreach (in this case, a string value that gets
written to a database inside the loop), will that have any affect on
the rest of the loop iterations or just the one I'm on?
If you change a property and that object is not used anywhere else, and it doesn't rely on some global mutable state (for example, sort of a public static Int32 ChangesCount that increments with each state change), then you should be safe.
a string value that gets written to a database inside the loop - depending on the used data access technology and how you use it, you may be in trouble, because most of them are not designed for multithreaded environment, like EF DbContext, for example. And obviously do not forget that dealing with concurrent access in database is not always easy, though that is a bit away from our original theme.
Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case - if there is no risk of external concurrent changes, then it is just an unnecessary work. And if there is a chance of another threads(not Parallel.For) making changes to those objects that are being persisted, then you already have bigger problems than Parallel.For.
Objects should always have observable consistent state (unlike when half of properties set by one thread, and half by another, while you try to persist that who-knows-what), and if they are used by many threads, then they should be already thread-safe - there should be no way to put them into inconsistent state.
And if they want to be persisted by external code, such objects should probably provide:
Either SyncRoot property to synchronize property reading code.
Or some current state snapshot DTO that is created internally by some thread-safe method like ThingSnapshot Thing.GetCurrentData() { lock() {} }.
Or something more exotic.

Is it a good practice to perform initialization within a Property?

I have a class PluginProvider that is using a PluginLoader component to load plugins (managed/native) from the file system. Within the PluginProvider class, there is currently defined a property called 'PluginTypes' which calls the 'InitializePlugins' instance method on get().
class PluginProvider
{
IEnumerable<IPluginType> PluginTypes
{
get
{
//isInitialized is set inside InitializePlugins method
if(!isInitialized)
{
InitializePlugins(); //contains thread safe code
}
//_pluginTypes is set within InitializePlugins method
return _pluginTypes;
}
}
}
I am looking at refactoring this piece of code. I want to know whether this kind of initialization is fine to do within a property. I know that heavy operations must not be done in a property. But when i checked this link : http://msdn.microsoft.com/en-us/library/vstudio/ms229054.aspx , found this " In particular, operations that access the network or the file system (other than once for initialization) should most likely be methods, not properties.". Now I am a bit confused. Please help.

If you want to delay the initialization as much as you can and you don't know when your property (or properties) will be called, what you're doing is fine.
If you want to delay and you have control over when your property will be called the first time, then you might want to make your method InitializePlugins() public and call it explicitly before accessing the property. This option also opens up the possibility of initializing asynchronously. For example, you could have an InitializePluginsAsync()that returns a Task.
If delaying the initialization is not a big concern, then just perform the initialization within the constructor.

This is of course a matter of taste. But what i would do depends on the length of the operation you're trying to perform. If it takes time to load the plugins, i would create a public method which any user would need to call before working with the class. A different approach would be to put the method inside the constructor, but IMO constructors should return as quickly as possible and should contain field / property initialization.
class PluginProvider
{
private bool _isInitialized;
IEnumerable<IPluginType> PluginTypes { get; set;}
public void Initialize()
{
if (_isInitialized)
{
return;
}
InitializePlugins();
_isInitialized = true;
}
}
Note the down side of this is that you will have to make sure the Initialize method was called before consuimg any operation.
Another thing that just came to mind backing this approach is exception handling. Im sure you wouldn't want your constructorcto be throwing any kind of IOException in case it couldn't load the types from the file system.

Any initialization type of code should be done in the constructor, that way you know it will be called once and only once.
public class PluginProvider
{
IEnumerable<IPluginType> PluginTypes
{
get
{
return _pluginTypes;
}
}
public PluginProvider()
{
InitializePlugins();
}
}

What you are doing there is called lazy initialization. You are postponing doing a potentially costly operation until the very moment its output is needed.
Now, this is not an absolute rule. If your InitializePlugins method takes a long time to complete and it might impact user experience, then you can consider moving it into a public method or even making it asynchronous and call it outside of the property: at app startup or whenever you find a good moment to call a long-lasting operation.
Otherwise, if it's a short lived one-time thing it can stay there. As I said, not an absolute rule. Generally these are some guidelines for whatever applies to a particular case.

How do I clear a System.Runtime.Caching.MemoryCache

I use a System.Runtime.Caching.MemoryCache to hold items which never expire. However, at times I need the ability to clear the entire cache. How do I do that?
I asked a similar question here concerning whether I could enumerate the cache, but that is a bad idea as it needs to be synchronised during enumeration.
I've tried using .Trim(100) but that doesn't work at all.
I've tried getting a list of all the keys via Linq, but then I'm back where I started because evicting items one-by-one can easily lead to race conditions.
I thought to store all the keys, and then issue a .Remove(key) for each one, but there is an implied race condition there too, so I'd need to lock access to the list of keys, and things get messy again.
I then thought that I should be able to call .Dispose() on the entire cache, but I'm not sure if this is the best approach, due to the way it's implemented.
Using ChangeMonitors is not an option for my design, and is unnecassarily complex for such a trivial requirement.
So, how do I completely clear the cache?

I was struggling with this at first. MemoryCache.Default.Trim(100) does not work (as discussed). Trim is a best attempt, so if there are 100 items in the cache, and you call Trim(100) it will remove the ones least used.
Trim returns the count of items removed, and most people expect that to remove all items.
This code removes all items from MemoryCache for me in my xUnit tests with MemoryCache.Default. MemoryCache.Default is the default Region.
foreach (var element in MemoryCache.Default)
{
MemoryCache.Default.Remove(element.Key);
}

You should not call dispose on the Default member of the MemoryCache if you want to be able to use it anymore:
The state of the cache is set to indicate that the cache is disposed.
Any attempt to call public caching methods that change the state of
the cache, such as methods that add, remove, or retrieve cache
entries, might cause unexpected behavior. For example, if you call the
Set method after the cache is disposed, a no-op error occurs. If you
attempt to retrieve items from the cache, the Get method will always
return Nothing.
http://msdn.microsoft.com/en-us/library/system.runtime.caching.memorycache.dispose.aspx
About the Trim, it's supposed to work:
The Trim property first removes entries that have exceeded either an absolute or sliding expiration. Any callbacks that are registered
for items that are removed will be passed a removed reason of Expired.
If removing expired entries is insufficient to reach the specified trim percentage, additional entries will be removed from the cache
based on a least-recently used (LRU) algorithm until the requested
trim percentage is reached.
But two other users reported it doesnt work on same page so I guess you are stuck with Remove() http://msdn.microsoft.com/en-us/library/system.runtime.caching.memorycache.trim.aspx
Update
However I see no mention of it being singleton or otherwise unsafe to have multiple instances so you should be able to overwrite your reference.
But if you need to free the memory from the Default instance you will have to clear it manually or destroy it permanently via dispose (rendering it unusable).
Based on your question you could make your own singleton-imposing class returning a Memorycache you may internally dispose at will.. Being the nature of a cache :-)

Here's is what I had made for something I was working on...
public void Flush()
{
List<string> cacheKeys = MemoryCache.Default.Select(kvp => kvp.Key).ToList();
foreach (string cacheKey in cacheKeys)
{
MemoryCache.Default.Remove(cacheKey);
}
}

I know this is an old question but the best option I've come across is to
Dispose the existing MemoryCache and create a new MemoryCache object.
https://stackoverflow.com/a/4183319/880642
The answer doesn't really provide the code to do this in a thread safe way. But this can be achieved using Interlocked.Exchange
var oldCache = Interlocked.Exchange(ref _existingCache, new MemoryCache("newCacheName"));
oldCache.Dispose();
This will swap the existing cache with a new one and allow you to safely call Dispose on the original cache. This avoids needing to enumerate the items in the cache and race conditions caused by disposing a cache while it is in use.
Edit
Here's how I use it in practice accounting for DI
public class CustomCacheProvider : ICustomCacheProvider
{
private IMemoryCache _internalCache;
private readonly ICacheFactory _cacheFactory;
public CustomCacheProvider (ICacheFactory cacheFactory)
{
_cacheFactory = cacheFactory;
_internalCache = _cacheFactory.CreateInstance();
}
public void Set(string key, object item, MemoryCacheEntryOptions policy)
{
_internalCache.Set(key, item, policy);
}
public object Get(string key)
{
return _internalCache.Get(key);
}
// other methods ignored for breviy
public void Dispose()
{
_internalCache?.Dispose();
}
public void EmptyCache()
{
var oldCache = Interlocked.Exchange(ref _internalCache, _cacheFactory.CreateInstance());
oldCache.Dispose();
}
}
The key is controlling access to the internal cache using another singleton which has the ability to create new cache instances using a factory (or manually if you prefer).

The details in #stefan's answer detail the principle; here's how I'd do it.
One should synchronise access to the cache whilst recreating it, to avoid the race condition of client code accessing the cache after it is disposed, but before it is recreated.
To avoid this synchronisation, do this in your adapter class (which wraps the MemoryCache):
public void clearCache() {
var oldCache = TheCache;
TheCache = new MemoryCache("NewCacheName", ...);
oldCache.Dispose();
GC.Collect();
}
This way, TheCache is always in a non-disposed state, and no synchronisation is needed.

I ran into this problem too. .Dispose() did something quite different than what I expected.
Instead, I added a static field to my controller class. I did not use the default cache, to get around this behavior, but created a private one (if you want to call it that). So my implementation looked a bit like this:
public class MyController : Controller
{
static MemoryCache s_cache = new MemoryCache("myCache");
public ActionResult Index()
{
if (conditionThatInvalidatesCache)
{
s_cache = new MemoryCache("myCache");
}
String s = s_cache["key"] as String;
if (s == null)
{
//do work
//add to s_cache["key"]
}
//do whatever next
}
}

Check out this post, and specifically, the answer that Thomas F. Abraham posted.
It has a solution that enables you to clear the entire cache or a named subset.
The key thing here is:
// Cache objects are obligated to remove entry upon change notification.
base.OnChanged(null);
I've implemented this myself, and everything seems to work just fine.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.