I have a really complex object (many different properties and also objects which contain same class objects in them and also backwards reference to parent objects) as global in a static class, which is initialized by a static constructor for 1 time and I want it to stay so and never change after that. This object is used many many times in my code in different places, whereas sometimes a clone is made out of it, with care never to change anything in its original reference (and properties, subproperties, etc). However, I guess I made a mistake somewhere and I removed some of its subproperties. I can find where the mistake is by step by step debugging, but it will cost me much time. Is there a way to lock the whole thing (not just the reference, but the whole object with all its properties, no matter how deep they are) not to be altered again after it is initialized for the first time?
I tried looking at readonly modifier, but I guess it won't suit me because it constraints only the reference of the object and not everything that comes under it.
Also private won't suit me for the same reason.
Is there a better way to this?
Is there a way to lock the whole thing
There is no way to prevent mutation of an object (or object graph) that is mutable. Put differently: If the object can be modified (such as if it has a public field that isn't readonly or if it has a property that has a setter), there is no way to prevent it from being modified.
I tried looking at readonly modifier, but I guess it won't suit me
because it constraints only the reference of the object and not
everything that comes under it.
Correct. When a field declaration includes a readonly modifier, assignments to the fields introduced by the declaration can only occur as part of the declaration or in a constructor in the same class. (Source: msdn)
However, you can design the type so it is immutable or at least so it can't be modified through public fields, properties or methods.
You might also consider to return clones of the object graph instead of the original object graph. If the clone gets modified, the original object graph is still unmodified.
Sounds like you want a singleton?
build a function in the class that contains this object:
function getComplexObject()
// is m_object NULL?
// instantiate object.
// return m_object.
And make m_object the property of the class.
Then always use this function to get the object. This way you can be sure it'll only be created once.
Related
I know what is deep copy, shallow copy, and how to deep copy etc, but my major doubt is When to deep copy an object reference? or How often?
Scenario 1 :
Consider a code, for full code please see http://pastebin.com/WEgeBFNb
class Box{
Position pos;
Box(Position p){
pos = p;
}
Position getPosition(){
return pos;
}
}
And a main() like:
public class Sample{
public static void main(String args[]){
Position pos = new Position(3,5);
Box box = new Box(pos);
pos.setX(5);
System.out.println( box.getPosition().getX());
// Will print 5, but I want Box to retain its value
}
I have achieved the above requirement by:
Box(Position p){
pos = new Position(p); // Deep cloning
}
Then I must have a copy constructor in Position too, like:
Position(Position p){
x = p.x;
y = p.y;
}
But my question is: When to use deep cloning?
Scenario 2:
For example, Consider a c# code.
List<Accounts> = Mysession.getAllAccounts();. Here, I expect the change in returned object must not reflect in session object. (This case is not only in C#, but in generally any oop language)
So, If I starts deep cloning, then, it's not a easy task, because it goes up to 5 levels deeper objects, with has-a relationship
Once again, I know to get exact 100% I must deep clone. Agree.
What is more common? Returning copy of reference or objects?
I heard that, deep cloning is a cumbersome process and one must avoid it. so how often one could deep - clone?
Could you give some example scenarios (code not needed).
While initialization like above box example, one must use cloning pos = new Position(p)? or normal assigning pos = p?
The main purpose in object-oriented programming must be that an object gurantees that it is in a leagal state at any time.
Thus when you return an object's reference you should think about:
Is the returned object immutable?
Does the current object that returns the reference (main object) has values that depend on the returned object? (derived or cached values)
You can react to the answers of these questions in the following ways:
The returned reference is an immutable object (String, BigDecimal, etc.)
no action required
The returned reference is a muttable object (array, Date, etc.), but the main object has NO derived values (e.g. only decorates it)
no action required
The returned reference is a muttable object (array, Date, etc.) and the main object has derived values
Make a copy of the object before you return it. This is applicable if an copy is easy to make and if it is not memory or time consuming (Depends on your non-functional requirements).
Return an unmodifiable reference to the original object (like Collections.unmodifiable... does).
Return a proxy that detects access to the returned object and that informs the main object about these changes so that the main object can recalculate the derived values and it will not be in an inconsistent state.
Ask yourself the same questions when you get an object reference. Through a constructor or method invocation.
Rather than thinking in terms of "deep" or "shallow" cloning, instead think in terms of what each encapsulated object reference represents. Suppose field Foo of some class instance George holds a reference of type IList<String>. Such a field could represent at least five different things:
A reference to an instance of an immutable type, held for the purpose of encapsulating the strings contained therein.
A reference to an object instance whose type could be mutable, but which will never be exposed to anything that might mutate it, held for the purpose of encapsulating the strings contained therein.
The only reference that will ever exist anywhere in the universe, outside the call stack of George's methods, to a mutable list which George is using to encapsulate its state.
A reference to a list whose contents may change, which form part of the mutable state of some other object. The field isn't used to encapsulate the content of the list, but rather its identity.
A reference to a list whose contents may change, whose contents are considered to be part of George's state, and to which outside persistent references exist.
If Foo is of the first two types, a proper copy of George may have its Foo refer either to the same list as George.Foo, a newly-constructed list which will always hold the same contents, or any other list which will always hold the same contents. If it's of the third type, a proper copy of George must have its Foo refer to a new list which is preloaded with copies of the items in George.Foo. If it's of the fourth type, a proper copy must have its Foo refer to the same object as George.Foo and not refer to a copy. If it's of the fifth type, George cannot be cloned in isolation.
If the list items were of a mutable type (instead of String), one would have to determine which of the five purposes applied to the items contained in the list, and treat each list item as one would treat a field. Note that for a type to be logically immutable, any references contained therein must be shareable. If proper behavior of an object would require that something to which it holds a reference not be the target of any other reference, that would imply that only one reference should exist to the object holding the reference.
My coworker made the claim that there is never a need to use Object when declaring variables, return parameters, etc in .NET 2.0 and newer.
He went further and said in all such cases, a Generic should be used as the alternative.
Is there any validity to this claim? Off the top of my head I use Object for locking concurrent threads...
Generics do trump object in a lot of cases, but only where the type is known.
There are still times when you don't know the type - object, or some other relevant base type is the answer in those instances.
For example:
object o = Activator.CreateInstance("Some type in an unreferenced assembly");
You won't be able to cast that result or maybe even know what the type is at compile time, so object is a valid use.
Your co-worker is generalising too much - perhaps point him at this question. Generics are great, give him that much, but they do not "replace" object.
object is perfect for a lock. Generics allow you to keep it typed appropriately. You can even constrain it to an interface or base class. You can't do that with object.
Consider this:
void DoSomething(object foo)
{
foo.DoFoo();
}
That won't work without any casting. But with generics...
void DoSomething<T>(T foo) where T : IHasDoFoo
{
foo.DoFoo();
}
With C# 4.0 and dynamic, you could deffer this to runtime, but I really haven't seen a need.
void DoSomething(dynamic foo)
{
foo.DoFoo();
}
When using interop with COM, you don't always have a choice... Generic don't really cater for the issues of interop.
Object is also the most lightweight option for a lock, as #Daniel A. White mentioned in his answer.
Yes there is validity. A good breakdown has already been made here.
However, I cannot confirm if there is no instance where you will never use objects, but personally I do not use them and even before generics I avoided boxing/unboxing.
There are lots of counterexamples, including the one you mentioned, using an object for synchronisation.
Another example is the DataSource property used in databinding, which can be set to one of a variety of different object types.
Broad counterexample: The System.Collections namespace is alive and well in .NET 4, no sign of deprecation or warning against its use on MSDN. The methods you find there take and return Objects.
Inherent in the question are actually two questions:
When should storage locations of type `Object` be used
When should instances of type `Object` be used
Storage locations of type Object must obviously be used in any circumstance where it will be necessary to hold references to instances of that type (since references to such instances cannot be held in any other type). Beyond that, they should be used in cases where they will hold references to objects which have no single useful common base type. This is obviously true in many scenarios using Reflection (where the type of an object may depend upon a string computed at run-time), but can also apply to certain varieties of collection which are populated with things whose type is known at compile time. As a simple example, one could represent a hierarchical collection of string indexed by sequences of int by having each node be of type Object, and having it hold either a String or an Object[]. Reading out items from such a collection would be somewhat clunky, since one would have to examine each item and determine whether it was an instance of Object[] or String, but such a method of storage would be extremely memory-efficient, since the only object instances would be those which either held the strings or the arrays. One could define a Node type with a field of type String and one of type Node[], or even define an abstract Node type with derived types StringNode (including a field of type String) and ArrayNode (with a field of type Node[]) but such approaches would increase the number of heap objects used to hold a given set of data.
Note that in general it's better to design collections so that the type of an object to be retrieved won't depend upon what's been shoved into the collection (perhaps using "parallel collections" for different types) but not everything works out that way semantically.
With regard to instances of type Object, I'm not sure there's any role they can fill which wouldn't be just as well satisfied by a sealed type called something like TokenObject which inherits from Object. There are a number of situations where it is useful to have an object instance whose sole purpose is to be a unique token. Conceptually, it might have been nicer to say:
TokenObject myLock = new TokenObject;
than to say
Object myLock = new Object;
since the former declaration would make clear that the declared variable was never going to be used to hold anything other than a token object. Nonetheless, common practice is to use instances of type Object in cases where the only thing that matters about the object is that its reference will be unique throughout the lifetime of the program.
I have a Struct with a field in it that loses its value. I can declare the field static and that solves the problem. I can also just change struct to class (changing nothing else) and that also solves the problem. I was just wondering why this is?
Structs are passed by value. In other words, when you pass a struct, you're passing a copy of its value. So if you take a copy of the value and change it, then the original will appear unchanged. You changed the copy, not the original.
Without seeing your code I cannot be sure, but I figure this is what's happening.
This doesn't happen for classes as they're passed by reference.
It's worth mentioning that this is why structs should be immutable -- that is, that once they're created, they do not change their value. Operations that provide modified versions return new structs.
EDIT: In the comments below, #supercat suggests that mutable properties can be more convenient. However property setters on structs can cause weird failures too. Here's an example that can catch you by surprise unless you deeply understand how structs work. For me, it's reason enough to avoid mutable structs altogether.
Consider the following types:
struct Rectangle {
public double Left { get; set; }
}
class Shape {
public Rectangle Bounds { get; private set; }
}
Ok, now imagine this code:
myShape.Bounds.Left = 100;
Perhaps surprisingly, This has no effect at all! Why? Let's re-write the code in longer yet equivalent form:
var bounds = myShape.Bounds;
bounds.Left = 100;
It's easier to see here how the value of Bounds is copied to a local variable, and then its value is changed. However at no point is the original value in Shape updated.
This is pretty compelling evidence to make all public structs immutable. If you know what you're doing, mutable structs can be handy, but personally I only really use them in that form as private nested classes.
As #supercat points out, the alternative is a little unsightly:
myShape.Bounds = new Rectangle(100, myShape.Bounds.Top,
myShape.Bounds.Width, myShape.Bounds.Height);
Sometimes it's more convenient to add helper methods:
myShape.Bounds = myShape.Bounds.WithLeft(100);
When a struct is passed by value, the system will make a copy of the struct for the callee, so it can see its contents, and perhaps modify its own copy, but but cannot affect the fields in the caller's copy. It's also possible to pass structs by ref, in which case the callee will be able to work with the caller's copy of the struct, modifying it if desired, and even pass it by ref to other functions which could do likewise. Note that the only way the called function can make the caller's copy of the struct available to other functions, though, is to pass it by ref, and the called function can't return until all functions to which it has passed the struct by ref have also returned. Thus, the caller can be assured that any changes which might occur to the structure as a consequence of the function call will have occurred by the time it returns.
This behavior is different from class objects; if a function passes a mutable class object to another function, it has no way of knowing if or when that other function will cause that object to be mutated immediately or at any future time, even after the function has finished running. The only way one can ever be sure that any mutable object won't be mutated by outside code is to be the sole holder of that object from the moment of its creation until its abandonment.
While one who is not used to value semantics may initially be "surprised" at the fact passing a struct by value simply gives the called function a copy of it, and assigning one struct storage location to another simply copies the contents of the struct, the guarantees that value types offer can be very useful. Since Point is a structure, one can know that a statement like MyPoints[5].X += 1; (assuming MyPoints is an array) will affect MyPoints[5].X but will not affect any other Point. One can further be assured that the only way MyPoints[5].X will change is if either MyPoints gets replaced with another array, or something writes to MyPoints[5]. By contrast, Point were a class and MyPoint[5] had ever been exposed to the outside world, the only way of knowing whether the aforementioned statement would affect field/property X of any other storage locations of type Point would be to examine every single storage location of type Point or Object that existed anywhere within the code to see if it pointed to the same instance as MyPoints[5]. Since there's no way for code to examine all of the storage locations of a particular type, such assurance would be impossible if Point[5] had ever been exposed to the outside world.
There is one annoying wrinkle with structs, though: generally, the system will only allow structures to be passed by ref if the called code is allowed to write to the structure in question. Struct method calls and property getters, however, receive this as a ref parameter but do not have the above restriction. Instead, when invoking a struct method or property getter on a read-only structure, the system will make a copy of the structure, pass that copy by ref to the method or property getter, and then discard it. Since the system has no way of knowing whether a method or property getter will try to mutate this, it won't complain in such cases--it will just generate silly code. If one avoids mutating this in anything other than property setters (the system won't allow the use of property setters on read-only structures), however, one can avoid problems.
Today I was trying to wrap my head around immutable objects that reference each other. I came to the conclusion that you can't possibly do that without using lazy evaluation but in the process I wrote this (in my opinion) interesting code.
public class A
{
public string Name { get; private set; }
public B B { get; private set; }
public A()
{
B = new B(this);
Name = "test";
}
}
public class B
{
public A A { get; private set; }
public B(A a)
{
//a.Name is null
A = a;
}
}
What I find interesting is that I cannot think of another way to observe object of type A in a state that is not yet fully constructed and that includes threads. Why is this even valid? Are there any other ways to observe the state of an object that is not fully constructed?
Why is this even valid?
Why do you expect it to be invalid?
Because a constructor is supposed to guarantee that the code it contains is executed before outside code can observe the state of the object.
Correct. But the compiler is not responsible for maintaining that invariant. You are. If you write code that breaks that invariant, and it hurts when you do that, then stop doing that.
Are there any other ways to observe the state of an object that is not fully constructed?
Sure. For reference types, all of them involve somehow passing "this" out of the constructor, obviously, since the only user code that holds the reference to the storage is the constructor. Some ways the constructor can leak "this" are:
Put "this" in a static field and reference it from another thread
make a method call or constructor call and pass "this" as an argument
make a virtual call -- particularly nasty if the virtual method is overridden by a derived class, because then it runs before the derived class ctor body runs.
I said that the only user code that holds a reference is the ctor, but of course the garbage collector also holds a reference. Therefore, another interesting way in which an object can be observed to be in a half-constructed state is if the object has a destructor, and the constructor throws an exception (or gets an asynchronous exception like a thread abort; more on that later.) In that case, the object is about to be dead and therefore needs to be finalized, but the finalizer thread can see the half-initialized state of the object. And now we are back in user code that can see the half-constructed object!
Destructors are required to be robust in the face of this scenario. A destructor must not depend on any invariant of the object set up by the constructor being maintained, because the object being destroyed might never have been fully constructed.
Another crazy way that a half-constructed object could be observed by outside code is of course if the destructor sees the half-initialized object in the scenario above, and then copies a reference to that object to a static field, thereby ensuring that the half-constructed, half-finalized object is rescued from death. Please do not do that. Like I said, if it hurts, don't do it.
If you're in the constructor of a value type then things are basically the same, but there are some small differences in the mechanism. The language requires that a constructor call on a value type creates a temporary variable that only the ctor has access to, mutate that variable, and then do a struct copy of the mutated value to the actual storage. That ensures that if the constructor throws, then the final storage is not in a half-mutated state.
Note that since struct copies are not guaranteed to be atomic, it is possible for another thread to see the storage in a half-mutated state; use locks correctly if you are in that situation. Also, it is possible for an asynchronous exception like a thread abort to be thrown halfway through a struct copy. These non-atomicity problems arise regardless of whether the copy is from a ctor temporary or a "regular" copy. And in general, very few invariants are maintained if there are asynchronous exceptions.
In practice, the C# compiler will optimize away the temporary allocation and copy if it can determine that there is no way for that scenario to arise. For example, if the new value is initializing a local that is not closed over by a lambda and not in an iterator block, then S s = new S(123); just mutates s directly.
For more information on how value type constructors work, see:
Debunking another myth about value types
And for more information on how C# language semantics try to save you from yourself, see:
Why Do Initializers Run In The Opposite Order As Constructors? Part One
Why Do Initializers Run In The Opposite Order As Constructors? Part Two
I seem to have strayed from the topic at hand. In a struct you can of course observe an object to be half-constructed in the same ways -- copy the half-constructed object to a static field, call a method with "this" as an argument, and so on. (Obviously calling a virtual method on a more derived type is not a problem with structs.) And, as I said, the copy from the temporary to the final storage is not atomic and therefore another thread can observe the half-copied struct.
Now let's consider the root cause of your question: how do you make immutable objects that reference each other?
Typically, as you've discovered, you don't. If you have two immutable objects that reference each other then logically they form a directed cyclic graph. You might consider simply building an immutable directed graph! Doing so is quite easy. An immutable directed graph consists of:
An immutable list of immutable nodes, each of which contains a value.
An immutable list of immutable node pairs, each of which has the start and end point of a graph edge.
Now the way you make nodes A and B "reference" each other is:
A = new Node("A");
B = new Node("B");
G = Graph.Empty.AddNode(A).AddNode(B).AddEdge(A, B).AddEdge(B, A);
And you're done, you've got a graph where A and B "reference" each other.
The problem, of course, is that you cannot get to B from A without having G in hand. Having that extra level of indirection might be unacceptable.
Yes, this is the only way for two immutable objects to refer to each other - at least one of them must see the other in a not-fully-constructed way.
It's generally a bad idea to let this escape from your constructor but in cases where you're confident of what both constructors do, and it's the only alternative to mutability, I don't think it's too bad.
"Fully constructed" is defined by your code, not by the language.
This is a variation on calling a virtual method from the constructor,
the general guideline is: don't do that.
To correctly implement the notion of "fully constructed", don't pass this out of your constructor.
Indeed, leaking the this reference out during the constructor will allow you to do this; it may cause problems if methods get invoked on the incomplete object, obviously. As for "other ways to observe the state of an object that is not fully constructed":
invoke a virtual method in a constructor; the subclass constructor will not have been called yet, so an override may try to access incomplete state (fields declared or initialized in the subclass, etc)
reflection, perhaps using FormatterServices.GetUninitializedObject (which creates an object without calling the constructor at all)
If you consider the initialization order
Derived static fields
Derived static constructor
Derived instance fields
Base static fields
Base static constructor
Base instance fields
Base instance constructor
Derived instance constructor
clearly through up-casting you can access the class BEFORE the derived instance constructor is called (this is the reason you shouldn't use virtual methods from constructors. They could easily access derived fields not initialized by the constructor/the constructor in the derived class could not have brought the derived class in a "consistent" state)
You can avoid the problem by instancing B last in your constuctor:
public A()
{
Name = "test";
B = new B(this);
}
If what you suggest was not possible, then A would not be immutable.
Edit: fixed, thanks to leppie.
The principle is that don't let your this object escape from the constructor body.
Another way to observe such problem is by calling virtual methods inside the constructor.
As noted, the compiler has no means of knowing at what point an object has been constructed well enough to be useful; it therefore assumes that a programmer who passes this from a constructor will know whether an object has been constructed well enough to satisfy his needs.
I would add, however, that for objects which are intended to be truly immutable, one must avoid passing this to any code which will examine the state of a field before it has been assigned its final value. This implies that this not be passed to arbitrary outside code, but does not imply that there is anything wrong with having an object under construction pass itself to another object for the purpose of storing a back-reference which will not actually be used until after the first constructor has completed.
If one were designing a language to facilitate the construction and use of immutable objects, it may be helpful for it to declare methods as being usable only during construction, only after construction, or either; fields could be declared as being non-dereferenceable during construction and read-only afterward; parameters could likewise be tagged to indicate that should be non-dereferenceable. Under such a system, it would be possible for a compiler to allow the construction of data structures which referred to each other, but where no property could ever change after it was observed. As to whether the benefits of such static checking would outweigh the cost, I'm not sure, but it might be interesting.
Incidentally, a related feature which would be helpful would be the ability to declare parameters and function returns as ephemeral, returnable, or (the default) persistable. If a parameter or function return were declared ephemeral, it could not be copied to any field nor passed as a persistable parameter to any method. Additionally, passing an ephemeral or returnable value as a returnable parameter to a method would cause the return value of the function to inherit the restrictions of that value (if a function has two returnable parameters, its return value would inherit the more restrictive constraint from its parameters). A major weakness with Java and .net is that all object references are promiscuous; once outside code gets its hands on one, there's no telling who may end up with it. If parameters could be declared ephemeral, it would more often be possible for code which held the only reference to something to know it held the only reference, and thus avoid needless defensive copy operations. Additionally, things like closures could be recycled if the compiler could know that no references to them existed after they returned.
I've seen static methods written (but I've never run the code) which uses instance data from another class (instance based).
Usually, instance data work with instance methods and likewise for static fields/methods. What is the implication of working on static data in an instance method? I'm assuming it is frowned upon but I can't find any details on what will happen under the hood. Also, what about instance methods working with static data?
Thanks
There is no problem having a static method use object instances or an instance method using static data.
The framework is full of methods that demonstrates this. The very commonly used String.Concat method for example is a static method that takes one or more object instances. (A Concat method call is what the compiler produces whenever you use the + operator to concatenate strings.)
The Int32.MaxValue is a static property, there is obviously no problem using that in an instance method.
I don't see a problem working on instance data from another object within a static method.
I assume that you mean, for example, passing an object's instance variable to a static method via a parameter, and that method then working on that variable.
Static just means you don't get this, but you could get otherobject->something
I don't think it would be any more frowned upon than just using a static method would be in the first place.
When working with static data in an instance method, the only implication I can think of is synchronization in a multithreaded application. I can't think of any adverse implications when working with instance data from a static method. However, just because something can be done doesn't mean it should be done.
Here is a concrete example you provided.
Class A is instance based and has an
instance field called ProductPrice of
double. Class B is static and has a
static method called
PlayAroundWithPrice(double price), and
the coder passes in the ProductPrice
field.
Obviously, there is nothing technically illegal with this example, but it goes against the grain for me. First of all, the ProductPrice field of Class A is obviously public since Class B can operate on it. For the purposes of encapsulation, I personally always make fields private and use a public property to access them. Second, because ProductPrice is a public field instead of a public property, there's no way for Class A to prevent ProductPrice from being set to invalid values (negative values, for example). Third (as stated above), if this example occurs in a multithreaded program, there could be synchronization issues. Fourth, I guess this is the real rub, why have a static method on Class B to operate on the field of Class A? Why not put the static method on Class A?
I don't know that I'd go as far as making this a hard-and-fast rule (perhaps simply a rule-of-thumb), but I would restrict using static methods for when you do not want to pay for the cost of constructing an object just to use the method.
For example, in the project I work on, I have an IPHeader class that will fully construct an IPHeader instance from a byte buffer. However, in most cases, I only need a couple of values from the IPHeader. So to avoid the costs associated with creating and garbage-collecting an IPHeader instance, I added a couple of static methods that will extract the values from the byte buffer directly.
I hope I've understood your question correctly.
Basically, instance methods has a hidden this parameter which is used to pass the instance the method is supposed to work on. This is the reason static methods cannot access instance data without explicit reference (as they cannot know which instance of object they should access).
Considering this, I don't see any special difference between static and instance methods. They are both methods and has more similarities than differences.
Both static data and instance data are prone to threading issues, however, instances are much less likely to be used between threads. As a consequence, accessing static fields might require more care (regarding syncronization issues).
There should be no problem. Recently I had a scenario where I needed each instance of a class to have a different, but reproducible random seed. I kept a private static int in the class, and incremented it for every instantiation, and used that as the seed.
It worked fine.
I don't think there is anything inherently wrong with using static data in an instance method but I think you need to really limit the types of data you use. The advantage / disadvantage of this approach is that a single data change can alter the behavior of all objects of a particular type. This makes changing that type of data very risky. I tend to constrain uses of this to the following scenarios
Immutable values - Nothing can change so there is nothing to worry about
Global Objects - I strongly refrain from doing this but I've found that occasionally the evils of having a global object outweigh the risks. These objects are carefully monitored though and heavily tested.