Related
After watching webinar Jon Skeet Inspects ReSharper, I've started to play a little with
recursive constructor calls and found, that the following code is valid C# code (by valid I mean it compiles).
class Foo
{
int a = null;
int b = AppDomain.CurrentDomain;
int c = "string to int";
int d = NonExistingMethod();
int e = Invalid<Method>Name<<Indeeed();
Foo() :this(0) { }
Foo(int v) :this() { }
}
As we all probably know, field initialization is moved into constructor by the compiler. So if you have a field like int a = 42;, you will have a = 42 in all constructors. But if you have constructor calling another constructor, you will have initialization code only in called one.
For example if you have constructor with parameters calling default constructor, you will have assignment a = 42 only in the default constructor.
To illustrate second case, next code:
class Foo
{
int a = 42;
Foo() :this(60) { }
Foo(int v) { }
}
Compiles into:
internal class Foo
{
private int a;
private Foo()
{
this.ctor(60);
}
private Foo(int v)
{
this.a = 42;
base.ctor();
}
}
So the main issue, is that my code, given at the start of this question, is compiled into:
internal class Foo
{
private int a;
private int b;
private int c;
private int d;
private int e;
private Foo()
{
this.ctor(0);
}
private Foo(int v)
{
this.ctor();
}
}
As you can see, the compiler can't decide where to put field initialization and, as result, doesn't put it anywhere. Also note, there are no base constructor calls. Of course, no objects can be created, and you will always end up with StackOverflowException if you will try to create an instance of Foo.
I have two questions:
Why does compiler allow recursive constructor calls at all?
Why we observe such behavior of the compiler for fields, initialized within such class?
Some notes: ReSharper warns you with Possible cyclic constructor calls. Moreover, in Java such constructor calls won't event compile, so the Java compiler is more restrictive in this scenario (Jon mentioned this information at the webinar).
This makes these questions more interesting, because with all respect to Java community, the C# compiler is at least more modern.
This was compiled using C# 4.0 and C# 5.0 compilers and decompiled using dotPeek.
Interesting find.
It appears that there are really only two kinds of instance constructors:
An instance constructor which chains another instance constructor of the same type, with the : this( ...) syntax.
An instance constructor which chains an instance constructor of the base class. This includes instance constructors where no chainig is specified, since : base() is the default.
(I disregarded the instance constructor of System.Object which is a special case. System.Object has no base class! But System.Object has no fields either.)
The instance field initializers that might be present in the class, need to be copied into the beginning of the body of all instance constructors of type 2. above, whereas no instance constructors of type 1. need the field assignment code.
So apparently there's no need for the C# compiler to do an analysis of the constructors of type 1. to see if there are cycles or not.
Now your example gives a situation where all instance constructors are of type 1.. In that situation the field initaializer code does not need to be put anywhere. So it is not analyzed very deeply, it seems.
It turns out that when all instance constructors are of type 1., you can even derive from a base class that has no accessible constructor. The base class must be non-sealed, though. For example if you write a class with only private instance constructors, people can still derive from your class if they make all instance constructors in the derived class be of type 1. above. However, an new object creation expression will never finish, of course. To create instances of the derived class, one would have to "cheat" and use stuff like the System.Runtime.Serialization.FormatterServices.GetUninitializedObject method.
Another example: The System.Globalization.TextInfo class has only an internal instance constructor. But you can still derive from this class in an assembly other than mscorlib.dll with this technique.
Finally, regarding the
Invalid<Method>Name<<Indeeed()
syntax. According to the C# rules, this is to be read as
(Invalid < Method) > (Name << Indeeed())
because the left-shift operator << has higher precedence than both the less-than operator < and the greater-than operator >. The latter two operarors have the same precedence, and are therefore evaluated by the left-associative rule. If the types were
MySpecialType Invalid;
int Method;
int Name;
int Indeed() { ... }
and if the MySpecialType introduced an (MySpecialType, int) overload of the operator <, then the expression
Invalid < Method > Name << Indeeed()
would be legal and meaningful.
In my opinion, it would be better if the compiler issued a warning in this scenario. For example, it could say unreachable code detected and point to the line and column number of the field initializer that is never translated into IL.
I think because the language specification only rules out directly invoking the same constructor that is being defined.
From 10.11.1:
All instance constructors (except those for class object) implicitly include an invocation of another instance constructor immediately before the constructor-body. The constructor to implicitly invoke is determined by the constructor-initializer
...
An instance constructor initializer of the form this(argument-listopt) causes an instance constructor from the class itself to be invoked ... If an instance constructor declaration includes a constructor initializer that invokes the constructor itself, a compile-time error occurs
That last sentence seems to only preclude direct calling itself as producing a compile time error, e.g.
Foo() : this() {}
is illegal.
I admit though - I can't see a specific reason for allowing it. Of course, at the IL level such constructs are allowed because different instance constructors could be selected at runtime, I believe - so you could have recursion provided it terminates.
I think the other reason it doesn't flag or warn on this is because it has no need to detect this situation. Imagine chasing through hundreds of different constructors, just to see if a cycle does exist - when any attempted usage will quickly (as we know) blow up at runtime, for a fairly edge case.
When it's doing code generation for each constructor, all it considers is constructor-initializer, the field initializers, and the body of the constructor - it doesn't consider any other code:
If constructor-initializer is an instance constructor for the class itself, it doesn't emit the field initializers - it emits the constructor-initializer call and then the body.
If constructor-initializer is an instance constructor for the direct base class, it emits the field initializers, then the constructor-initializer call, and then then body.
In neither case does it need to go looking elsewhere - so it's not a case of it being "unable" to decide where to place the field initializers - it's just following some simple rules that only consider the current constructor.
Your example
class Foo
{
int a = 42;
Foo() :this(60) { }
Foo(int v) { }
}
will work fine, in the sense that you can instantiate that Foo object without problems. However, the following would be more like the code that you're asking about
class Foo
{
int a = 42;
Foo() :this(60) { }
Foo(int v) : this() { }
}
Both that and your code will create a stackoverflow (!), because the recursion never bottoms out. So your code is ignored because it never gets to execute.
In other words, the compiler can't decide where to put the faulty code because it can tell that the recursion never bottoms out. I think this is because it has to put it where it will only be called once, but the recursive nature of the constructors makes that impossible.
Recursion in the sense of a constructor creating instances of itself within the body of the constructor makes sense to me, because e.g. that could be used to instantiate trees where each node points to other nodes. But recursion via the pre-constructors of the sort illustrated by this question can't ever bottom out, so it would make sense for me if that was disallowed.
I think this is allowed because you can (could) still catch the Exception and do something meaningfull with it.
The initialisation will never be run, and it will almost certaintly throw a StackOverflowException. But this can still be wanted behaviour, and didn't always mean the process should crash.
As explained here https://stackoverflow.com/a/1599236/869482
I am working on a large project where a base class has thousands of classes derived from it (multiple developers are working on them). Each class is expected to override a set of methods. I first generated these thousands of class files with a code template that conforms to an acceptable pattern. I am now writing unit tests to ensure that developers have not deviated from this pattern. Here is a sample generated class:
// Base class.
public abstract partial class BaseClass
{
protected abstract bool OnTest ();
}
// Derived class. DO NOT CHANGE THE CLASS NAME!
public sealed partial class DerivedClass_00000001: BaseClass
{
/// <summary>
/// Do not modify the code template in any way.
/// Write code only in the try and finally blocks in this method.
/// </summary>
protected override void OnTest ()
{
bool result = false;
ComObject com = null;
// Declare ALL value and reference type variables here. NOWHERE ELSE!
// Variables that would otherwise be narrowly scoped should also be declared here.
// Initialize all reference types to [null]. [object o;] does not conform. [object o = null;] conforms.
// Initialize all value types to their default values. [int x;] does not conform. [int x = 0;] conforms.
try
{
com = new ComObject();
// Process COM objects here.
// Do NOT return out of this function yourself!
}
finally
{
// Release all COM objects.
System.Runtime.InteropServices.Marshal.ReleaseComObject(com);
// Set all COM objects to [null].
// The base class will take care of explicit garbage collection.
com = null;
}
return (result);
}
}
In the unit tests, I have been able to verify the following via reflection:
The class derives from [BaseClass] and does not implement any interfaces.
The class name conforms to a pattern.
The catch block has not been filtered.
No other catch blocks have been added.
No class level fields or properties have been declared.
All method value type variables have been manually initialized upon declaration.
No other methods have been added to the derived classes.
The above is easily achieved via reflection but I am struggling with asserting the following list:
The catch block re-throws the caught exception rather than wrapping it or throwing some other exception.
The [return (result);] line at the end has not been modified and no other [return (whatever);] calls have been added. No idea how to achieve this.
Verify that all reference types implementing IDisposable have been disposed.
Verify that all reference types of type [System.__ComObject] have been manually de-referenced and set to [null] in the finally block.
I have thought about parsing the source code but I don't like that solution unless absolutely necessary. It is messy and unless I have expression trees, almost impossible to guarantee success.
Any tips would be appreciated.
Some thoughts:
If the methods need to be overriden, why are they virtual instead of abstract?
Code that should not be changed doesn't belong in the derived class. It belongs in the base class.
catch { throw; } is useless. Remove it.
Returning a boolean value from a void method causes a compiler error.
Setting local variables to null is useless.
Not all reference types implement IDisposable.
Generally: Most of your requirements seem to have no business value.
Why prohibit implementation of an interface?
Why prohibit declaration of other methods?
Why prohibit catch clauses?
etc.
You should really think about what your actual business requirements are and model your classes after them. If the classes need to fulfill a certain contract, model that contract. Leave the implementation to the implementor.
About the actual questions raised:
You can't use reflection here. You can either analyze the original source code or the IL code of the compiled assembly.
Both options are pretty tricky and most likely impossible to achieve within your limited time. I am positive that fixing the architecture would take less time than implementing one of those options.
You could try to use Roslyn CTP here if the fully automated code analysis is what you really need. It has more advanced syntax and semantics analysis than reflection does. But it is still a lot of work. Working directly with developers, not with their code, preparing templates, guidelines may be more time efficient.
While I'm sure you have a very good reason for such rigid requirements... have you considered passing a Lambda's/Delegates/Action to the Test function instead?
Can't solve everything, but would more logically give you some of the behaviours you want (e.g. can't return, can't have class level variables, can't write code anywhere but specified).
Biggest concern with it would be captured variables... but there may be work arounds for that.
Example Code:
//I'd make a few signatures....
bool OnTest<T1, T2> (Action<ComObject, T1, T2> logic, T1 first, T2 second)
{
bool result = false;
ComObject com = null;
//no checks needed re parameters
//Can add reflection tests here if wanted before code is run.
try
{
com = new ComObject();
//can't return
logic(com, first,second);
}
finally
{
// Release all COM objects.
System.Runtime.InteropServices.Marshal.ReleaseComObject(com);
// Set all COM objects to [null].
// The base class will take care of explicit garbage collection.
com = null;
//If you want, we can check each argument and if it is disposable dispose.
if (first is IDisposable && first != null) ((IDisposable) first).Dispose();
...
}
return (result); //can't be changed
}
No idea if this'll work, but it's just a thought. Oh, and as a thought it's not thorough or tested - I'd expect you to develop it drastically.
I've been programming in C# and Java recently and I am curious where the best place is to initialize my class fields.
Should I do it at declaration?:
public class Dice
{
private int topFace = 1;
private Random myRand = new Random();
public void Roll()
{
// ......
}
}
or in a constructor?:
public class Dice
{
private int topFace;
private Random myRand;
public Dice()
{
topFace = 1;
myRand = new Random();
}
public void Roll()
{
// .....
}
}
I'm really curious what some of you veterans think is the best practice. I want to be consistent and stick to one approach.
My rules:
Don't initialize with the default values in declaration (null, false, 0, 0.0…).
Prefer initialization in declaration if you don't have a constructor parameter that changes the value of the field.
If the value of the field changes because of a constructor parameter put the initialization in the constructors.
Be consistent in your practice (the most important rule).
In C# it doesn't matter. The two code samples you give are utterly equivalent. In the first example the C# compiler (or is it the CLR?) will construct an empty constructor and initialise the variables as if they were in the constructor (there's a slight nuance to this that Jon Skeet explains in the comments below).
If there is already a constructor then any initialisation "above" will be moved into the top of it.
In terms of best practice the former is less error prone than the latter as someone could easily add another constructor and forget to chain it.
I think there is one caveat. I once committed such an error: Inside of a derived class, I tried to "initialize at declaration" the fields inherited from an abstract base class. The result was that there existed two sets of fields, one is "base" and another is the newly declared ones, and it cost me quite some time to debug.
The lesson: to initialize inherited fields, you'd do it inside of the constructor.
The semantics of C# differs slightly from Java here. In C# assignment in declaration is performed before calling the superclass constructor. In Java it is done immediately after which allows 'this' to be used (particularly useful for anonymous inner classes), and means that the semantics of the two forms really do match.
If you can, make the fields final.
Assuming the type in your example, definitely prefer to initialize fields in the constructor. The exceptional cases are:
Fields in static classes/methods
Fields typed as static/final/et al
I always think of the field listing at the top of a class as the table of contents (what is contained herein, not how it is used), and the constructor as the introduction. Methods of course are chapters.
In Java, an initializer with the declaration means the field is always initialized the same way, regardless of which constructor is used (if you have more than one) or the parameters of your constructors (if they have arguments), although a constructor might subsequently change the value (if it is not final). So using an initializer with a declaration suggests to a reader that the initialized value is the value that the field has in all cases, regardless of which constructor is used and regardless of the parameters passed to any constructor. Therefore use an initializer with the declaration only if, and always if, the value for all constructed objects is the same.
There are many and various situations.
I just need an empty list
The situation is clear. I just need to prepare my list and prevent an exception from being thrown when someone adds an item to the list.
public class CsvFile
{
private List<CsvRow> lines = new List<CsvRow>();
public CsvFile()
{
}
}
I know the values
I exactly know what values I want to have by default or I need to use some other logic.
public class AdminTeam
{
private List<string> usernames;
public AdminTeam()
{
usernames = new List<string>() {"usernameA", "usernameB"};
}
}
or
public class AdminTeam
{
private List<string> usernames;
public AdminTeam()
{
usernames = GetDefaultUsers(2);
}
}
Empty list with possible values
Sometimes I expect an empty list by default with a possibility of adding values through another constructor.
public class AdminTeam
{
private List<string> usernames = new List<string>();
public AdminTeam()
{
}
public AdminTeam(List<string> admins)
{
admins.ForEach(x => usernames.Add(x));
}
}
What if I told you, it depends?
I in general initialize everything and do it in a consistent way. Yes it's overly explicit but it's also a little easier to maintain.
If we are worried about performance, well then I initialize only what has to be done and place it in the areas it gives the most bang for the buck.
In a real time system, I question if I even need the variable or constant at all.
And in C++ I often do next to no initialization in either place and move it into an Init() function. Why? Well, in C++ if you're initializing something that can throw an exception during object construction you open yourself to memory leaks.
The design of C# suggests that inline initialization is preferred, or it wouldn't be in the language. Any time you can avoid a cross-reference between different places in the code, you're generally better off.
There is also the matter of consistency with static field initialization, which needs to be inline for best performance. The Framework Design Guidelines for Constructor Design say this:
✓ CONSIDER initializing static fields inline rather than explicitly using static constructors, because the runtime is able to optimize the performance of types that don’t have an explicitly defined static constructor.
"Consider" in this context means to do so unless there's a good reason not to. In the case of static initializer fields, a good reason would be if initialization is too complex to be coded inline.
Being consistent is important, but this is the question to ask yourself:
"Do I have a constructor for anything else?"
Typically, I am creating models for data transfers that the class itself does nothing except work as housing for variables.
In these scenarios, I usually don't have any methods or constructors. It would feel silly to me to create a constructor for the exclusive purpose of initializing my lists, especially since I can initialize them in-line with the declaration.
So as many others have said, it depends on your usage. Keep it simple, and don't make anything extra that you don't have to.
Consider the situation where you have more than one constructor. Will the initialization be different for the different constructors? If they will be the same, then why repeat for each constructor? This is in line with kokos statement, but may not be related to parameters. Let's say, for example, you want to keep a flag which shows how the object was created. Then that flag would be initialized differently for different constructors regardless of the constructor parameters. On the other hand, if you repeat the same initialization for each constructor you leave the possibility that you (unintentionally) change the initialization parameter in some of the constructors but not in others. So, the basic concept here is that common code should have a common location and not be potentially repeated in different locations. So I would say always put it in the declaration until you have a specific situation where that no longer works for you.
There is a slight performance benefit to setting the value in the declaration. If you set it in the constructor it is actually being set twice (first to the default value, then reset in the ctor).
When you don't need some logic or error handling:
Initialize class fields at declaration
When you need some logic or error handling:
Initialize class fields in constructor
This works well when the initialization value is available and the
initialization can be put on one line. However, this form of
initialization has limitations because of its simplicity. If
initialization requires some logic (for example, error handling or a
for loop to fill a complex array), simple assignment is inadequate.
Instance variables can be initialized in constructors, where error
handling or other logic can be used.
From https://docs.oracle.com/javase/tutorial/java/javaOO/initial.html .
I normally try the constructor to do nothing but getting the dependencies and initializing the related instance members with them. This will make you life easier if you want to unit test your classes.
If the value you are going to assign to an instance variable does not get influenced by any of the parameters you are going to pass to you constructor then assign it at declaration time.
Not a direct answer to your question about the best practice but an important and related refresher point is that in the case of a generic class definition, either leave it on compiler to initialize with default values or we have to use a special method to initialize fields to their default values (if that is absolute necessary for code readability).
class MyGeneric<T>
{
T data;
//T data = ""; // <-- ERROR
//T data = 0; // <-- ERROR
//T data = null; // <-- ERROR
public MyGeneric()
{
// All of the above errors would be errors here in constructor as well
}
}
And the special method to initialize a generic field to its default value is the following:
class MyGeneric<T>
{
T data = default(T);
public MyGeneric()
{
// The same method can be used here in constructor
}
}
"Prefer initialization in declaration", seems like a good general practice.
Here is an example which cannot be initialized in the declaration so it has to be done in the constructor.
"Error CS0236 A field initializer cannot reference the non-static field, method, or property"
class UserViewModel
{
// Cannot be set here
public ICommand UpdateCommad { get; private set; }
public UserViewModel()
{
UpdateCommad = new GenericCommand(Update_Method); // <== THIS WORKS
}
void Update_Method(object? parameter)
{
}
}
Reading a book (VS 2010), it says that commands (statements) in .NET Csharp cannot exist outside of method.
I am wondering - field declaration etc, these are commands, are they not? And they exist at class level. Can somebody elaborate at this a bit?
If you mean:
class Foo
{
int count = 0;
StringBuilder buffer = new StringBuilder();
}
The count and buffer are declarations using initializer expressions . But this code contains no statements.
A field initialiser is written with the code outside a method, but the compiler puts that code inside the constructor.
So a field initialiser like this:
class Foo {
int Bar = 42;
}
is basiclally a field and an initialiser in the constructor:
class Foo {
int Bar;
Foo() {
Bar = 42;
}
}
There's no such concept as a "command" in C#.
And a static / instance variable declaration isn't categorized as a statement within C# - it's a field-declaration (which is a type of class-member-declaration) as per the C# spec. See section 10.5 of the C# 4 spec for example.
Now the statements which declare local variables are statements, as defined by declaration-statement in the spec (section 8.5). They're only used for locals though. See section B.2.5 for a complete list of statement productions within C# 4.
Basically, the C# spec defines the terminology involved - so while you might think informally of "commands" and the like, in a matter of correctness the C# spec is the source of authority. (Except for where it doesn't say what the language designers meant to say, of course. That's pretty rare.)
As you said they're declarations, a statement is one which actually gets something done.
No, they're declarations. Class member declarations, to be precise.
And it's perfectly legal for those to exist outside of a method. Otherwise, you couldn't declare a method in the first place!
By "statements", the book is telling you that you can't have things like method calls outside of a method. For example, the following code is illegal:
public void DoSomething()
{
// Do something here...
}
MessageBox.Show("This statement is not allowed because it is outside a method.");
Classes, namespace, fields declarations are not declarations statements.
A field can be initialised outside a method with an expression but while an expression is a statement there are lots of statements that are not expressions (eg. if).
It all comes down to how the language grammar defines the terms, and the way C# does it is pretty common (eg. very similar to C and C++).
I've decided to use this.variableName when referring to string/int etc.. fields.
Would that include ArrayList, ListBox etc too?
Like:
private ListBox usersListBox;
private void PopulateListBox()
{
this.usersListBox.Items.Add(...);
}
...Or not?
And what about classes?
MyClass myClass;
private void PlayWithMyClass()
{
this.myClass = new MyClass();
this.myClass.Name = "Bob";
}
?
This looks kind of odd to me.
And I don't know if I should use this.PublicProperty or only private fields.
I'm not 100% with the C# terminology, but hopefully what I said makes sense.
I used to do that sort of thing, but now I find that IDEs are pretty smart about giving me a visual indication that I'm dealing with a member variable. I only use "this" when it's necessary to distinguish the member variable from a parameter of the same name.
the this. command will allow you to call anything that is in scope in the same class as you are executing. You can access private and public variables and since everything in c# is a object calling a class is the same as calling a string.
You don't have to use this in your code if you don't want to as it is implied in c# unless a method param and a global variable are the same.
Less is more. Less text to parse is more readable.
I use this in the constructors since my parameters and member variables have the same names (I don't like marking member variables with _).
public class A
{
int a;
public A(int a)
{
this.a = a;
}
}
If your class is small enough and does one thing well, then usually you wouldn't need to add this for the sake of readability.
If you can read the whole class easily, what would be the point? It'd be more typing and clutter the code, thus possibly degrade the readability
using the 'this' keyword can be against any instance of an object. So this means u can use it to reference a class instance (eg. usersListBox, myClass, etc).
It's perfectly fine.
Some people use it to clearly explain what they are referencing so people understand that the instances are in the scope of the code and not external or part of another instance or static member elsewhere.
Finally, you can use it to reference both private and/or public properties and fields and members.
This is nothing more then a keyword pointing to the current instance. In a function, this.foo is generally the same as foo.
As msdn tells you:
The this keyword refers to the current instance of the class.
The page about the this keyword contains a lot more info.
As the this. is implicit you only need to actually use it when disambiguating between class variables and local variables of the same name.
The examples you've given would work how you've written then or like this:
private ListBox usersListBox;
private void PopulateListBox()
{
usersListBox.Items.Add(...);
}
MyClass myClass;
private void PlayWithMyClass()
{
myClass = new MyClass();
myClass.Name = "Bob";
}
it's just a matter of personal preference. If you do choose one over the other, try to be consistent.