I have a class PluginProvider that is using a PluginLoader component to load plugins (managed/native) from the file system. Within the PluginProvider class, there is currently defined a property called 'PluginTypes' which calls the 'InitializePlugins' instance method on get().
class PluginProvider
{
IEnumerable<IPluginType> PluginTypes
{
get
{
//isInitialized is set inside InitializePlugins method
if(!isInitialized)
{
InitializePlugins(); //contains thread safe code
}
//_pluginTypes is set within InitializePlugins method
return _pluginTypes;
}
}
}
I am looking at refactoring this piece of code. I want to know whether this kind of initialization is fine to do within a property. I know that heavy operations must not be done in a property. But when i checked this link : http://msdn.microsoft.com/en-us/library/vstudio/ms229054.aspx , found this " In particular, operations that access the network or the file system (other than once for initialization) should most likely be methods, not properties.". Now I am a bit confused. Please help.
If you want to delay the initialization as much as you can and you don't know when your property (or properties) will be called, what you're doing is fine.
If you want to delay and you have control over when your property will be called the first time, then you might want to make your method InitializePlugins() public and call it explicitly before accessing the property. This option also opens up the possibility of initializing asynchronously. For example, you could have an InitializePluginsAsync()that returns a Task.
If delaying the initialization is not a big concern, then just perform the initialization within the constructor.
This is of course a matter of taste. But what i would do depends on the length of the operation you're trying to perform. If it takes time to load the plugins, i would create a public method which any user would need to call before working with the class. A different approach would be to put the method inside the constructor, but IMO constructors should return as quickly as possible and should contain field / property initialization.
class PluginProvider
{
private bool _isInitialized;
IEnumerable<IPluginType> PluginTypes { get; set;}
public void Initialize()
{
if (_isInitialized)
{
return;
}
InitializePlugins();
_isInitialized = true;
}
}
Note the down side of this is that you will have to make sure the Initialize method was called before consuimg any operation.
Another thing that just came to mind backing this approach is exception handling. Im sure you wouldn't want your constructorcto be throwing any kind of IOException in case it couldn't load the types from the file system.
Any initialization type of code should be done in the constructor, that way you know it will be called once and only once.
public class PluginProvider
{
IEnumerable<IPluginType> PluginTypes
{
get
{
return _pluginTypes;
}
}
public PluginProvider()
{
InitializePlugins();
}
}
What you are doing there is called lazy initialization. You are postponing doing a potentially costly operation until the very moment its output is needed.
Now, this is not an absolute rule. If your InitializePlugins method takes a long time to complete and it might impact user experience, then you can consider moving it into a public method or even making it asynchronous and call it outside of the property: at app startup or whenever you find a good moment to call a long-lasting operation.
Otherwise, if it's a short lived one-time thing it can stay there. As I said, not an absolute rule. Generally these are some guidelines for whatever applies to a particular case.
Related
Suppose I have a class that provides some data to my application. Data initially comes from database, and I provide it through some methods that handle the whole database thing and present the result as a usable class instead of raw query result. This class has to do some setup (not complex) to make sure any method called can use the database (e.g. connect to database, make sure it contains some critical info, etc). So, were I to put it in a method (say, method Init(), that would handle checking for database, connecting to it, verifying that it does contain the info), I would have to make sure that this method is called before any other method.
So, I usually find that instead of doing this:
public class DataProvider
{
private SqlController controller;
public void Init()
{
controller = new SqlController();
controller.Init();
controller.ConnectToDataBase();
CheckForCriticalInfoInDatabase();
}
public Data GetData()
{
// get data from database (not actually going to use raw queries like that, just an example)
var queryResult = sqlController.RunQuery("SELECT something FROM SOME_TABLE");
// and present it as usable class
Data usefulData = QueryResultToUsefulData(queryResult);
return usefulData;
}
...
}
and then always making sure I call Init() before GetData(), i do something like
private SqlController _controller;
private SqlController controller
{
get
{
if (_controller == null)
{
_controller = new SqlController();
_controller.Init();
_controller.ConnectToDataBase();
CheckForCriticalInfoInDatabase();
}
return controller;
}
}
So, now i can be sure that i won't use an uninitialised SqlController, and I don't have to do that same null check in every method that uses it. However, I never noticed getters being used this way in other peoples' code.
Is there some pitfall I don't see? To me it looks like it's the same as lazy initialization, with the exception being that I use it not because the initialization is heavy or long, but because I don't want to check the order in which I call methods. This question points out that it's not thread-safe (not a concern in my case, plus I imagine it could be made thread-safe with some locks) and that setting the property to null will result in unintuitive behaviour (not a concern, because I don't have a setter at all and the backing field shouldn't be touched either way).
Also, if this kind of code IS bas practice, what is the proper way to ensure that my methods don't rely on order in which they are called?
As #madreflection said in the OP comments, use a method for anything that is possibly going to be slow. Getters and setters should just be quick ways of getting and setting a value.
Connections to dbs can be slow or fail to connect so you may have catches setup to try different connection methods etc.
You could also have the checking occur in the constructor of the object, that way the object cannot be used without init() being run in a different function, saving on time tracing where an error is actually occurring.
For example if you had one function create the object, do a bunch of 'stuff' then try to use the object without running init(), then you get the error after all of the 'stuff' not where you created the object. This could lead you to think there is something wrong in whatever way you are using the object, not that it has not been initialised.
I am tasked with writing a system to process result files created by a different process(which I have no control over) and and trying to modify my code to make use of Parallel.Foreach. The code works fine when just calling a foreach but I have some concerns about thread safety when using the parallel version. The base question I need answered here is "Is the way I am doing this going to guarantee thread safety?" or is this going to cause everything to go sideways on me.
I have tried to make sure all calls are to instances and have removed every static anything except the initial static void Main. It is my current understanding that this will do alot towards assuring thread safety.
I have basically the following, edited for brevity
static void Main(string[] args)
{
MyProcess process = new MyProcess();
process.DoThings();
}
And then in the actual process to do stuff I have
public class MyProcess
{
public void DoThings()
{
//Get some list of things
List<Thing> things = getThings();
Parallel.Foreach(things, item => {
//based on some criteria, take actions from MyActionClass
MyActionClass myAct = new MyActionClass(item);
string tempstring = myAct.DoOneThing();
if(somecondition)
{
MyAct.DoOtherThing();
}
...other similar calls to myAct below here
};
}
}
And over in the MyActionClass I have something like the following:
public class MyActionClass
{
private Thing _thing;
public MyActionClass(Thing item)
{
_thing = item;
}
public string DoOneThing()
{
return _thing.GetSubThings().FirstOrDefault();
}
public void DoOtherThing()
{
_thing.property1 = "Somenewvalue";
}
}
If I can explain this any better I'll try, but I think that's the basics of my needs
EDIT:
Something else I just noticed. If I change the value of a property of the item I'm working with while inside the Parallel.Foreach (in this case, a string value that gets written to a database inside the loop), will that have any affect on the rest of the loop iterations or just the one I'm on? Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case?
There is no shared mutable state between actions in the Parallel.ForEach that I can see, so it should be thread-safe, because at most one thread can touch one object at a time.
But as it has been mentioned there is nothing shared that can be seen. It doesn't mean that in the actual code you use everything is as good as it seems here.
Or that nothing will be changed by you or your coworker that will make some state both shared and mutable (in the Thing, for example), and now you start getting difficult to reproduce crashes at best or just plain wrong behaviour at worst that can be left undetected for a long time.
So, perhaps you should try to go fully immutable near threading code?
Perhaps.
Immutability is good, but it is not a silver bullet, and it is not always easy to use and implement, or that every task can be reasonably expressed through immutable objects. And even that accidental "make shared and mutable" change may happen to it as well, though much less likely.
It should at least be considered as a possible option/alternative.
About the EDIT
If I change the value of a property of the item I'm working with while
inside the Parallel.Foreach (in this case, a string value that gets
written to a database inside the loop), will that have any affect on
the rest of the loop iterations or just the one I'm on?
If you change a property and that object is not used anywhere else, and it doesn't rely on some global mutable state (for example, sort of a public static Int32 ChangesCount that increments with each state change), then you should be safe.
a string value that gets written to a database inside the loop - depending on the used data access technology and how you use it, you may be in trouble, because most of them are not designed for multithreaded environment, like EF DbContext, for example. And obviously do not forget that dealing with concurrent access in database is not always easy, though that is a bit away from our original theme.
Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case - if there is no risk of external concurrent changes, then it is just an unnecessary work. And if there is a chance of another threads(not Parallel.For) making changes to those objects that are being persisted, then you already have bigger problems than Parallel.For.
Objects should always have observable consistent state (unlike when half of properties set by one thread, and half by another, while you try to persist that who-knows-what), and if they are used by many threads, then they should be already thread-safe - there should be no way to put them into inconsistent state.
And if they want to be persisted by external code, such objects should probably provide:
Either SyncRoot property to synchronize property reading code.
Or some current state snapshot DTO that is created internally by some thread-safe method like ThingSnapshot Thing.GetCurrentData() { lock() {} }.
Or something more exotic.
I have a constructor for a class called ActNode which takes a class parameter called Act like this:
public Act Act; //the act affiliated with this node...
public ActNode(Act moAct, ...some others, not important) {
if (moAct == null)
throw new SomeException(); //this is never entered
Act = moAct;
... some other codes
}
The constructor above is the only constructor the ActNode has and anywhere in the code, the Act passed to the constructor is not null. Then, in one of the ActNode's methods, a bool flag of the Act is checked like this:
public void ActNodeMethod() {
if (Act.AnActFlagToBeChecked) { //FIXME this Act can be surprisingly null!
//do something
}
}
Thus, in a single thread environment, the Act.AnActFlagToBeChecked cannot throw NullException since the Act of an ActNode cannot be null. However, I use the ActNode in a multi-thread environment. It is not always, but sometimes the line above:
if (Act.AnActFlagToBeChecked) //FIXME this Act can be surprisingly null!
Can throw null exception.
Why is this so and how to fix it?
From reading a post in SO, it is said that this may happen if the Act is not initialized (therefore having default value of null) but the method is called. Yet in my implementation, there is no such case, because the Act everywhere in the code is never null when the constructor is called.
The only thing here is that I implement it in multi-thread environment where multiple ActNodes can call ActNodeMethod at the same time. But it shouldn't be a problem (or it could?) since each ActNode will have its own resource called Act associated to it (not a shared resource).
I am pretty confused here and would appreciate if someone can help enlightment me of the possible issues of the above implementation.
To give bigger context, ActNode is a TreeNode which I use to store a value representing the time needed to go to that node. I implement the ActNode in my searching tree algorithm to find the fastest solution to finish up a set of given "Acts". I use multi-threads because it can speed up the searching process.
In erlang you can pass initial state to actor when spawning it. This way you don't need to deal with init messages which take actor to it's initial state again or messages needing init message arrived before. In orleans given the assumption of grains always exist you can not use constructors. Is there any way to pass initial state to grains, avoiding thus any init method which break consistency by needing it to be called before any other method?
When I say "take actor to it's initial state", I mean, in orleans context, call init method of specific grain activation twice. It is like overwriting state. May be you need this king of message which reset state but if you don't need it, it is a pitfall, a potential source of bugs.
I'm looking for some type of constructor, something like spawn(module, function, [initial state]) from erlang. My first attempt was look for any overload of GetGrain with the following signature: GrainFactory.GetGrain<IGrain>(id, initialState);
As #svick suggests, OnActivateAsync is the best approach for loading an initial state for a grain.
public class ExampleGrain : Orleans.Grain, IExampleGrain
{
public override Task OnActivateAsync()
{
// set initial state for grain
return base.OnActivateAsync();
}
...
This method will be called every time the grain is initialised (not just the very first time). You could use the Persistence infrastructure built into Orleans to record whether the grain had been created previously (perhaps using a boolean property on your state class) i.e.
public class ExampleGrainState : GrainState
{
public bool Initialised { get; set; }
}
[StorageProvider(ProviderName = "Storage")]
public class QuadKeyGrain : Orleans.Grain<ExampleGrainState>, IExampleGrain
{
public override async Task OnActivateAsync()
{
if (!this.State.Initialised)
{
// do initialisation
this.State.Initialised = true;
await this.WriteStateAsync();
}
await base.OnActivateAsync();
}
See this tutorial for more information on persistence:
http://dotnet.github.io/orleans/Tutorials/Declarative-Persistence.html
Grains in Orleans are always exist, so you with your approach are going to [conditionally] re-initialize the grain every time when it gets activated. Is this really what you want to be done?
Well, if you really need to initialize the specific grain to the specific state, then you can use its key (string key or string part of the key) to pass in some json. Just remember that the key has some limitations for its size.
In existing code of my project, at number of places the property is declared like this:
public long ResourceID
{
get
{
return this.resourceID;
}
set
{
if (this.resourceID != value)
{
this.resourceID = value;
}
}
}
Note: private long resourceID is already declared.
Properties not only of value types but also of reference types (including string) too are declared like this.
Another example:
public Collection<Ability> Abilities
{
get
{
return this.abilities;
}
set
{
if (value == null)
{
throw new ArgumentNullException("Abilities");
}
this.abilities = value;
}
}
As per my knowledge, the setter in the first example does not make any sense and the if condition is meaningless there. So i decided to change the code (as part of refactoring) to make them Auto-Properties. (In second example I need setter since exception is handled there.)
I want to know from experts here, will whether making existing properties auto properties (or at least removing if condition from setter) cause any harm? Sometimes there are subtle things which a developer may not be aware of and some changes can have side effects too. That's why I am asking this question. (My libraries are used at many places.)
Note: Let me know if this is purely a homework question.
Converting:
private long resourceID;
public long ResourceID
{
get
{
return this.resourceID;
}
set
{
this.resourceID = value;
}
}
into:
public long ResourceID { get; set; }
won't cause any harm, guaranteed.
Removing the if statement might cause harm. For example in WPF when working with the MVVM pattern and implementing the INotifyPropertyChanged interface it is often good practice to check whether the value has changed before actually setting it. Removing this check will provoke notifications to be sent to the UI no matter whether the value changed or not. So it would be a breaking change.
I can only think of one kind of problem you could run into (which is fixable):
If you are using ORM or other external tool, they might rely on a naming convention for finding properties/fields. So, the 3rd party dll might be looking for a field resourceId that no longer exists.
So, code using reflection to access fields might break, but if you have control over the codebase, that is unlikely to be an issue.
There are some edge-cases where this might cause harm:
Changing to an automatically implemented property {get;set;}
if you are using field-based serialization at any point (for example, BinaryFormatter), then this will break when changing to an automatically implemented property, as the field-name will change. This will also impact any other scenario that uses reflection to access the (hopefully private) fields, but BinaryFormatter is the most common cause of confusion here.
Removing the if test
will be fine for most data-types such as long etc, however, if you use it with a type that implements a custom equality operation, you might find you are suddenly swapping a reference when previously (with the if) the objects reported equal for different references
The first is a more likely problem. If you are using BinaryFormatter, then keep the private field (byt maybe remove the if test). And then start refactoring your code away from BinaryFormatter ;p
What you have done is correct. The if statement is meaningless. I always think that less code is better, because the lines of code is directly proportional to the number of faults.
public long ResourceID { get; set; }
Your first example only sets the resourceID field if its value has changed.
The only difference you would see by removing the "if" test is a possible impact if multiple threads are reading the value. In which case they probably should be using a lock, so it's almost certainly safe to remove the test.
Your second example prevents a caller from setting the property value to null. Presumably the field is initialized to a non-null value, and this has value as it means that callers can read the property without needing to check for null.
Usually in such scenarios and how you've explained, it shouldn't be a concern.
You could just go ahead and change the code of all properties;
public long ResourceID { get; set; }
Or
public long ResourceID
{
get { return this.resourceID; }
set { this.resourceID = value; }
}
But it might cause an issue if upon
changing the value of the property,
it cascades to some other custom
function-call which is only executed
if the new value is different from
old ones. Usually when when you've implemented custom events or even maybe in case of property-changed events
Also might affect, when using
Data-Context classes
Both scenarios are totally application specific.
I'd suggest you reactor with caution. Or as you've written yourself, HOMEWORK.