Why does a recursive constructor call make invalid C# code compile? - c#

After watching webinar Jon Skeet Inspects ReSharper, I've started to play a little with
recursive constructor calls and found, that the following code is valid C# code (by valid I mean it compiles).
class Foo
{
int a = null;
int b = AppDomain.CurrentDomain;
int c = "string to int";
int d = NonExistingMethod();
int e = Invalid<Method>Name<<Indeeed();
Foo() :this(0) { }
Foo(int v) :this() { }
}
As we all probably know, field initialization is moved into constructor by the compiler. So if you have a field like int a = 42;, you will have a = 42 in all constructors. But if you have constructor calling another constructor, you will have initialization code only in called one.
For example if you have constructor with parameters calling default constructor, you will have assignment a = 42 only in the default constructor.
To illustrate second case, next code:
class Foo
{
int a = 42;
Foo() :this(60) { }
Foo(int v) { }
}
Compiles into:
internal class Foo
{
private int a;
private Foo()
{
this.ctor(60);
}
private Foo(int v)
{
this.a = 42;
base.ctor();
}
}
So the main issue, is that my code, given at the start of this question, is compiled into:
internal class Foo
{
private int a;
private int b;
private int c;
private int d;
private int e;
private Foo()
{
this.ctor(0);
}
private Foo(int v)
{
this.ctor();
}
}
As you can see, the compiler can't decide where to put field initialization and, as result, doesn't put it anywhere. Also note, there are no base constructor calls. Of course, no objects can be created, and you will always end up with StackOverflowException if you will try to create an instance of Foo.
I have two questions:
Why does compiler allow recursive constructor calls at all?
Why we observe such behavior of the compiler for fields, initialized within such class?
Some notes: ReSharper warns you with Possible cyclic constructor calls. Moreover, in Java such constructor calls won't event compile, so the Java compiler is more restrictive in this scenario (Jon mentioned this information at the webinar).
This makes these questions more interesting, because with all respect to Java community, the C# compiler is at least more modern.
This was compiled using C# 4.0 and C# 5.0 compilers and decompiled using dotPeek.

Interesting find.
It appears that there are really only two kinds of instance constructors:
An instance constructor which chains another instance constructor of the same type, with the : this( ...) syntax.
An instance constructor which chains an instance constructor of the base class. This includes instance constructors where no chainig is specified, since : base() is the default.
(I disregarded the instance constructor of System.Object which is a special case. System.Object has no base class! But System.Object has no fields either.)
The instance field initializers that might be present in the class, need to be copied into the beginning of the body of all instance constructors of type 2. above, whereas no instance constructors of type 1. need the field assignment code.
So apparently there's no need for the C# compiler to do an analysis of the constructors of type 1. to see if there are cycles or not.
Now your example gives a situation where all instance constructors are of type 1.. In that situation the field initaializer code does not need to be put anywhere. So it is not analyzed very deeply, it seems.
It turns out that when all instance constructors are of type 1., you can even derive from a base class that has no accessible constructor. The base class must be non-sealed, though. For example if you write a class with only private instance constructors, people can still derive from your class if they make all instance constructors in the derived class be of type 1. above. However, an new object creation expression will never finish, of course. To create instances of the derived class, one would have to "cheat" and use stuff like the System.Runtime.Serialization.FormatterServices.GetUninitializedObject method.
Another example: The System.Globalization.TextInfo class has only an internal instance constructor. But you can still derive from this class in an assembly other than mscorlib.dll with this technique.
Finally, regarding the
Invalid<Method>Name<<Indeeed()
syntax. According to the C# rules, this is to be read as
(Invalid < Method) > (Name << Indeeed())
because the left-shift operator << has higher precedence than both the less-than operator < and the greater-than operator >. The latter two operarors have the same precedence, and are therefore evaluated by the left-associative rule. If the types were
MySpecialType Invalid;
int Method;
int Name;
int Indeed() { ... }
and if the MySpecialType introduced an (MySpecialType, int) overload of the operator <, then the expression
Invalid < Method > Name << Indeeed()
would be legal and meaningful.
In my opinion, it would be better if the compiler issued a warning in this scenario. For example, it could say unreachable code detected and point to the line and column number of the field initializer that is never translated into IL.

I think because the language specification only rules out directly invoking the same constructor that is being defined.
From 10.11.1:
All instance constructors (except those for class object) implicitly include an invocation of another instance constructor immediately before the constructor-body. The constructor to implicitly invoke is determined by the constructor-initializer
...
An instance constructor initializer of the form this(argument-listopt) causes an instance constructor from the class itself to be invoked ... If an instance constructor declaration includes a constructor initializer that invokes the constructor itself, a compile-time error occurs
That last sentence seems to only preclude direct calling itself as producing a compile time error, e.g.
Foo() : this() {}
is illegal.
I admit though - I can't see a specific reason for allowing it. Of course, at the IL level such constructs are allowed because different instance constructors could be selected at runtime, I believe - so you could have recursion provided it terminates.
I think the other reason it doesn't flag or warn on this is because it has no need to detect this situation. Imagine chasing through hundreds of different constructors, just to see if a cycle does exist - when any attempted usage will quickly (as we know) blow up at runtime, for a fairly edge case.
When it's doing code generation for each constructor, all it considers is constructor-initializer, the field initializers, and the body of the constructor - it doesn't consider any other code:
If constructor-initializer is an instance constructor for the class itself, it doesn't emit the field initializers - it emits the constructor-initializer call and then the body.
If constructor-initializer is an instance constructor for the direct base class, it emits the field initializers, then the constructor-initializer call, and then then body.
In neither case does it need to go looking elsewhere - so it's not a case of it being "unable" to decide where to place the field initializers - it's just following some simple rules that only consider the current constructor.

Your example
class Foo
{
int a = 42;
Foo() :this(60) { }
Foo(int v) { }
}
will work fine, in the sense that you can instantiate that Foo object without problems. However, the following would be more like the code that you're asking about
class Foo
{
int a = 42;
Foo() :this(60) { }
Foo(int v) : this() { }
}
Both that and your code will create a stackoverflow (!), because the recursion never bottoms out. So your code is ignored because it never gets to execute.
In other words, the compiler can't decide where to put the faulty code because it can tell that the recursion never bottoms out. I think this is because it has to put it where it will only be called once, but the recursive nature of the constructors makes that impossible.
Recursion in the sense of a constructor creating instances of itself within the body of the constructor makes sense to me, because e.g. that could be used to instantiate trees where each node points to other nodes. But recursion via the pre-constructors of the sort illustrated by this question can't ever bottom out, so it would make sense for me if that was disallowed.

I think this is allowed because you can (could) still catch the Exception and do something meaningfull with it.
The initialisation will never be run, and it will almost certaintly throw a StackOverflowException. But this can still be wanted behaviour, and didn't always mean the process should crash.
As explained here https://stackoverflow.com/a/1599236/869482

Related

C# initial value in constructor OR class variable declaration? [duplicate]

I've been programming in C# and Java recently and I am curious where the best place is to initialize my class fields.
Should I do it at declaration?:
public class Dice
{
private int topFace = 1;
private Random myRand = new Random();
public void Roll()
{
// ......
}
}
or in a constructor?:
public class Dice
{
private int topFace;
private Random myRand;
public Dice()
{
topFace = 1;
myRand = new Random();
}
public void Roll()
{
// .....
}
}
I'm really curious what some of you veterans think is the best practice. I want to be consistent and stick to one approach.
My rules:
Don't initialize with the default values in declaration (null, false, 0, 0.0…).
Prefer initialization in declaration if you don't have a constructor parameter that changes the value of the field.
If the value of the field changes because of a constructor parameter put the initialization in the constructors.
Be consistent in your practice (the most important rule).
In C# it doesn't matter. The two code samples you give are utterly equivalent. In the first example the C# compiler (or is it the CLR?) will construct an empty constructor and initialise the variables as if they were in the constructor (there's a slight nuance to this that Jon Skeet explains in the comments below).
If there is already a constructor then any initialisation "above" will be moved into the top of it.
In terms of best practice the former is less error prone than the latter as someone could easily add another constructor and forget to chain it.
I think there is one caveat. I once committed such an error: Inside of a derived class, I tried to "initialize at declaration" the fields inherited from an abstract base class. The result was that there existed two sets of fields, one is "base" and another is the newly declared ones, and it cost me quite some time to debug.
The lesson: to initialize inherited fields, you'd do it inside of the constructor.
The semantics of C# differs slightly from Java here. In C# assignment in declaration is performed before calling the superclass constructor. In Java it is done immediately after which allows 'this' to be used (particularly useful for anonymous inner classes), and means that the semantics of the two forms really do match.
If you can, make the fields final.
Assuming the type in your example, definitely prefer to initialize fields in the constructor. The exceptional cases are:
Fields in static classes/methods
Fields typed as static/final/et al
I always think of the field listing at the top of a class as the table of contents (what is contained herein, not how it is used), and the constructor as the introduction. Methods of course are chapters.
In Java, an initializer with the declaration means the field is always initialized the same way, regardless of which constructor is used (if you have more than one) or the parameters of your constructors (if they have arguments), although a constructor might subsequently change the value (if it is not final). So using an initializer with a declaration suggests to a reader that the initialized value is the value that the field has in all cases, regardless of which constructor is used and regardless of the parameters passed to any constructor. Therefore use an initializer with the declaration only if, and always if, the value for all constructed objects is the same.
There are many and various situations.
I just need an empty list
The situation is clear. I just need to prepare my list and prevent an exception from being thrown when someone adds an item to the list.
public class CsvFile
{
private List<CsvRow> lines = new List<CsvRow>();
public CsvFile()
{
}
}
I know the values
I exactly know what values I want to have by default or I need to use some other logic.
public class AdminTeam
{
private List<string> usernames;
public AdminTeam()
{
usernames = new List<string>() {"usernameA", "usernameB"};
}
}
or
public class AdminTeam
{
private List<string> usernames;
public AdminTeam()
{
usernames = GetDefaultUsers(2);
}
}
Empty list with possible values
Sometimes I expect an empty list by default with a possibility of adding values through another constructor.
public class AdminTeam
{
private List<string> usernames = new List<string>();
public AdminTeam()
{
}
public AdminTeam(List<string> admins)
{
admins.ForEach(x => usernames.Add(x));
}
}
What if I told you, it depends?
I in general initialize everything and do it in a consistent way. Yes it's overly explicit but it's also a little easier to maintain.
If we are worried about performance, well then I initialize only what has to be done and place it in the areas it gives the most bang for the buck.
In a real time system, I question if I even need the variable or constant at all.
And in C++ I often do next to no initialization in either place and move it into an Init() function. Why? Well, in C++ if you're initializing something that can throw an exception during object construction you open yourself to memory leaks.
The design of C# suggests that inline initialization is preferred, or it wouldn't be in the language. Any time you can avoid a cross-reference between different places in the code, you're generally better off.
There is also the matter of consistency with static field initialization, which needs to be inline for best performance. The Framework Design Guidelines for Constructor Design say this:
✓ CONSIDER initializing static fields inline rather than explicitly using static constructors, because the runtime is able to optimize the performance of types that don’t have an explicitly defined static constructor.
"Consider" in this context means to do so unless there's a good reason not to. In the case of static initializer fields, a good reason would be if initialization is too complex to be coded inline.
Being consistent is important, but this is the question to ask yourself:
"Do I have a constructor for anything else?"
Typically, I am creating models for data transfers that the class itself does nothing except work as housing for variables.
In these scenarios, I usually don't have any methods or constructors. It would feel silly to me to create a constructor for the exclusive purpose of initializing my lists, especially since I can initialize them in-line with the declaration.
So as many others have said, it depends on your usage. Keep it simple, and don't make anything extra that you don't have to.
Consider the situation where you have more than one constructor. Will the initialization be different for the different constructors? If they will be the same, then why repeat for each constructor? This is in line with kokos statement, but may not be related to parameters. Let's say, for example, you want to keep a flag which shows how the object was created. Then that flag would be initialized differently for different constructors regardless of the constructor parameters. On the other hand, if you repeat the same initialization for each constructor you leave the possibility that you (unintentionally) change the initialization parameter in some of the constructors but not in others. So, the basic concept here is that common code should have a common location and not be potentially repeated in different locations. So I would say always put it in the declaration until you have a specific situation where that no longer works for you.
There is a slight performance benefit to setting the value in the declaration. If you set it in the constructor it is actually being set twice (first to the default value, then reset in the ctor).
When you don't need some logic or error handling:
Initialize class fields at declaration
When you need some logic or error handling:
Initialize class fields in constructor
This works well when the initialization value is available and the
initialization can be put on one line. However, this form of
initialization has limitations because of its simplicity. If
initialization requires some logic (for example, error handling or a
for loop to fill a complex array), simple assignment is inadequate.
Instance variables can be initialized in constructors, where error
handling or other logic can be used.
From https://docs.oracle.com/javase/tutorial/java/javaOO/initial.html .
I normally try the constructor to do nothing but getting the dependencies and initializing the related instance members with them. This will make you life easier if you want to unit test your classes.
If the value you are going to assign to an instance variable does not get influenced by any of the parameters you are going to pass to you constructor then assign it at declaration time.
Not a direct answer to your question about the best practice but an important and related refresher point is that in the case of a generic class definition, either leave it on compiler to initialize with default values or we have to use a special method to initialize fields to their default values (if that is absolute necessary for code readability).
class MyGeneric<T>
{
T data;
//T data = ""; // <-- ERROR
//T data = 0; // <-- ERROR
//T data = null; // <-- ERROR
public MyGeneric()
{
// All of the above errors would be errors here in constructor as well
}
}
And the special method to initialize a generic field to its default value is the following:
class MyGeneric<T>
{
T data = default(T);
public MyGeneric()
{
// The same method can be used here in constructor
}
}
"Prefer initialization in declaration", seems like a good general practice.
Here is an example which cannot be initialized in the declaration so it has to be done in the constructor.
"Error CS0236 A field initializer cannot reference the non-static field, method, or property"
class UserViewModel
{
// Cannot be set here
public ICommand UpdateCommad { get; private set; }
public UserViewModel()
{
UpdateCommad = new GenericCommand(Update_Method); // <== THIS WORKS
}
void Update_Method(object? parameter)
{
}
}

Why do I get this compile error trying to call a base constructor/method that takes a dynamic argument?

While refactoring some code, I came across this strange compile error:
The constructor call needs to be dynamically dispatched, but cannot be because it is part of a constructor initializer. Consider casting the dynamic arguments.
It seems to occur when trying to call base methods/constructors that take dynamic arguments. For example:
class ClassA
{
public ClassA(dynamic test)
{
Console.WriteLine("ClassA");
}
}
class ClassB : ClassA
{
public ClassB(dynamic test)
: base(test)
{
Console.WriteLine("ClassB");
}
}
It works if I cast the argument to object, like this:
public ClassB(dynamic test)
: base((object)test)
So, I'm a little confused. Why do I have to put this nasty cast in - why can't the compiler figure out what I mean?
The constructor chain has to be determined for certain at compile-time - the compiler has to pick an overload so that it can create valid IL. Whereas normally overload resolution (e.g. for method calls) can be deferred until execution time, that doesn't work for chained constructor calls.
EDIT: In "normal" C# code (before C# 4, basically), all overload resolution is performed at compile-time. However, when a member invocation involves a dynamic value, that is resolved at execution time. For example consider this:
using System;
class Program
{
static void Foo(int x)
{
Console.WriteLine("int!");
}
static void Foo(string x)
{
Console.WriteLine("string!");
}
static void Main(string[] args)
{
dynamic d = 10;
Foo(d);
}
}
The compiler doesn't emit a direct call to Foo here - it can't, because in the call Foo(d) it doesn't know which overload it would resolve to. Instead it emits code which does a sort of "just in time" mini-compilation to resolve the overload with the actual type of the value of d at execution time.
Now that doesn't work for constructor chaining, as valid IL has to contain a call to a specific base class constructor. (I don't know whether the dynamic version can't even be expressed in IL, or whether it can, but the result would be unverifiable.)
You could argue that the C# compiler should be able to tell that there's only actually one visible constructor which can be called, and that constructor will always be available... but once you start down that road, you end up with a language which is very complicated to specify. The C# designers usually take the position of having simpler rules which occasionally aren't as powerful as you'd like them to be.

Why can't non-static fields be initialized inside structs?

Consider this code block:
struct Animal
{
public string name = ""; // Error
public static int weight = 20; // OK
// initialize the non-static field here
public void FuncToInitializeName()
{
name = ""; // Now correct
}
}
Why can we initialize a static field inside a struct but not a non-static field?
Why do we have to initialize non-static in methods bodies?
Have a look at Why Can't Value Types have Default Constructors?
The CLI expects to be able to allocate and create new instances of any value type that would require 'n' bytes of memory, by simply allocating 'n' bytes and filling them with zero. There's no reason the CLI "couldn't" provide a means of specifying either that before any entity containing structs is made available to outside code, a constructor must be run on every struct therein, or that a whenever an instance of a particular n-byte struct is created, the compiler should copy a 'template instance'. As it is, however, the CLI doesn't allow such a thing. Consequently, there's no reason for a compiler to pretend it has a means of assuring that structs will be initialized to anything other than the memory-filled-with-zeroes default.
You cannot write a custom default constructor in a structure. The instance field initializers will eventually need to get moved to the constructor which you can't define.
Static field initializers are moved to a static constructor. You can write a custom static constructor in a struct.
You can do exactly what you're trying. All you're missing is a custom constructor that calls the default constructor:
struct Animal
{
public string name = "";
public static int weight = 20;
public Animal(bool someArg) : this() { }
}
The constructor has to take at least one parameter, and then it has to forward to this() to get the members initialised.
The reason this works is that the compiler now has a way to discover the times when the code should run to initialise the name field: whenever you write new Animal(someBool).
With any struct you can say new Animal(), but "blank" animals can be created implicitly in many circumstances in the workings of the CLR, and there isn't a way to ensure custom code gets run every time that happens.

Proper way to accomplish this construction using constructor chaining?

I have an assignment for my first OOP class, and I understand all of it including the following statement:
You should create a class called ComplexNumber. This class will contain the real and imaginary parts of the complex number in private data members defined as doubles. Your class should contain a constructor that allows the data members of the imaginary number to be specified as parameters of the constructor. A default (non-parameterized) constructor should initialize the data members to 0.0.
Of course I know how to create these constructors without chaining them together, and the assignment does not require chaining them, but I want to as I just like to.
Without chaining them together, my constructors look like this:
class ComplexNumber
{
private double realPart;
private double complexPart;
public ComplexNumber()
{
realPart = 0.0;
complexPart = 0.0
}
public ComplexNumber(double r, double c)
{
realPart = r;
complexPart = c;
}
// the rest of my class code...
}
Is this what you're looking for?
public ComplexNumber()
: this(0.0, 0.0)
{
}
public ComplexNumber(double r, double c)
{
realPart = r;
complexPart = c;
}
#Rex has the connect answer for chaining.
However in this case chaining or any initialization is not necessary. The CLR will initialize fields to their default value during object constructor. For doubles this will cause them to be initialized to 0.0. So the assignment in the default constructor case is not strictly necessary.
Some people prefer to explicitly initialize their fields for documentation or readability though.
I am still trying to grasp the concept of constructor-chaining, so it works, but why/how?
The 'how' of constructor chaining by using the 'this' keyword in the constructor definition, and shown in Rex M's example.
The 'why' of constructor chaining is to reuse the implementation of a constructor. If the implementation (body) of the 2nd constructor were long and complicated, then you'd want to reuse it (i.e. chain to it or invoke it) instead of copy-and-pasting it into other constructors. An alternative might be to put that code, which is shared between several constructors, into a common subroutine which is invoked from several constructors: however, this subroutine wouldn't be allowed to initialize readonly fields (which can only be initialized from a constructor and not from a subroutine), so constructor chaining is a work-around for that.

Calling constructor overload when both overload have same signature

Consider the following class,
class Foo
{
public Foo(int count)
{
/* .. */
}
public Foo(int count)
{
/* .. */
}
}
Above code is invalid and won't compile. Now consider the following code,
class Foo<T>
{
public Foo(int count)
{
/* .. */
}
public Foo(T t)
{
/* .. */
}
}
static void Main(string[] args)
{
Foo<int> foo = new Foo<int>(1);
}
Above code is valid and compiles well. It calls Foo(int count).
My question is, if the first one is invalid, how can the second one be valid? I know class Foo<T> is valid because T and int are different types. But when it is used like Foo<int> foo = new Foo<int>(1), T is getting integer type and both constructor will have same signature right? Why don't compiler show error rather than choosing an overload to execute?
There is no ambiguity, because the compiler will choose the most specific overload of Foo(...) that matches. Since a method with a generic type parameter is considered less specific than a corresponding non-generic method, Foo(T) is therefore less specific than Foo(int) when T == int. Accordingly, you are invoking the Foo(int) overload.
Your first case (with two Foo(int) definitions) is an error because the compiler will allow only one definition of a method with precisely the same signature, and you have two.
Your question was hotly debated when C# 2.0 and the generic type system in the CLR were being designed. So hotly, in fact, that the "bound" C# 2.0 specification published by A-W actually has the wrong rule in it! There are four possibilities:
1) Make it illegal to declare a generic class that could POSSIBLY be ambiguous under SOME construction. (This is what the bound spec incorrectly says is the rule.) So your Foo<T> declaration would be illegal.
2) Make it illegal to construct a generic class in a manner which creates an ambiguity. declaring Foo<T> would be legal, constructing Foo<double> would be legal, but constructing Foo<int> would be illegal.
3) Make it all legal and use overload resolution tricks to work out whether the generic or nongeneric version is better. (This is what C# actually does.)
4) Do something else I haven't thought of.
Rule #1 is a bad idea because it makes some very common and harmless scenarios impossible. Consider for example:
class C<T>
{
public C(T t) { ... } // construct a C that wraps a T
public C(Stream state) { ... } // construct a C based on some serialized state from disk
}
You want that to be illegal just because C<Stream> is ambiguous? Yuck. Rule #1 is a bad idea, so we scrapped it.
Unfortunately, it is not as simple as that. IIRC the CLI rules say that an implementation is allowed to reject as illegal constructions that actually do cause signature ambiguities. That is, the CLI rules are something like Rule #2, whereas C# actually implements Rule #3. Which means that there could in theory be legal C# programs that translate into illegal code, which is deeply unfortunate.
For some more thoughts on how these sorts of ambiguities make our lives wretched, here are a couple of articles I wrote on the subject:
http://blogs.msdn.com/ericlippert/archive/2006/04/05/569085.aspx
http://blogs.msdn.com/ericlippert/archive/2006/04/06/odious-ambiguous-overloads-part-two.aspx
Eric Lippert blogged about this recently.
The fact is that they do not both have the same signature - one is using generics while this other is not.
With those methods in place you could also call it using a non-int object:
Foo<string> foo = new Foo<string>("Hello World");

Categories