Detect compiler generated default constructor using reflection in C# - c#

I'm targeting .NET 3.5 SP1 and I'm using CommentChecker to validate my XML documentation, everything works OK until I get to a class like this:
/// <summary>
/// documentation
/// </summary>
public sealed class MyClass {
/// <summary>
/// documentation
/// </summary>
public void Method() {
}
}
In the example above, as I understand, the compiler generates a default constructor for my class. The problem with this is that CommentChecker generates warnings telling me that the constructor is missing the comments.
I tried to modify the program to detect this special case and ignore it but I'm stuck, I already tried with IsDefined(typeof(CompilerGeneratedAttribute), true) but that did not work.
So in short, how can I detect the default constructor using reflection?

If you're willing to dig a little into the IL, then you can get most of the way there.
First, assuming you have ConstructorInfo instance which you know to be parameterless, you can get the method body and the bytes for the method body like so (we'll start building an extension method to do this):
public static bool MightBeCSharpCompilerGenerated(
this ConstructorInfo constructor)
{
// Validate parmaeters.
if (constructor == null) throw new ArgumentNullException("constructor");
// If the method is static, throw an exception.
if (constructor.IsStatic)
throw new ArgumentException("The constructor parameter must be an " +
"instance constructor.", "constructor");
// Get the body.
byte[] body = constructor.GetMethodBody().GetILAsByteArray();
You can reject any method bodies don't have seven bytes.
// Feel free to put this in a constant.
if (body.Length != 7) return false;
The reason will be obvious in the code that follows.
In section I.8.9.6.6 of ECMA-335 (Common Language Infrastructure (CLI) Partitions I to VI), it states CLS rule 21:
CLS Rule 21: An object constructor shall call some instance
constructor of its base class before any access occurs to inherited
instance data. (This does not apply to value types, which need not
have constructors.)
This means that before anything else is done, the a base constructor must be called. We can check for this in the IL. The IL for this would look like this (I've put the byte values in parenthesis before the IL command):
// Loads "this" on the stack, as the first argument on an instance
// method is always "this".
(0x02) ldarg.0
// No parameters are loaded, but metadata token will be explained.
(0x28) call <metadata token>
We can now start checking the bytes for this:
// Check the first two bytes, if they are not the loading of
// the first argument and then a call, it's not
// a call to a constructor.
if (body[0] != 0x02 || body[1] != 0x28) return false;
Now comes the metadata token. The call instruction requires a method descriptor to be passed in the form of a metadata token along with the constructor. This is a four-byte value which is exposed through the MetadataToken property on the MemberInfo class (from which ConstructorInfo derives).
We could check to see that the metadata token was valid, but because we've already checked the length of byte array for the method body (at seven bytes), and we know that there's only one byte left to check (first two op codes + four byte metadata token = six bytes), we don't have to check to see that it's to a parameterless constructor; if there were parameters, there would be other op codes to push the parameters on the stack, expanding the byte array.
Finally, if nothing else is done in the constructor (indicating that the compiler generated a constructor that does nothing but call the base), a ret instruction would emitted after the call the metadata token:
(0x2A) ret
Which we can check like so:
return body[6] == 0x2a;
}
It needs to be noted why the method is called MightBeCSharpCompilerGenerated, with an emphasis on Might.
Let's say you have the following classes:
public class Base { }
public class Derived : Base { public Derived() { } }
When compiling without optimizations (typically DEBUG mode), the C# compiler will insert a few nop codes (presumably to assist the debugger) for the Derived class which would cause a call to MightBeCSharpCompilerGenerated to return false.
However, when optimizations are turned on (typically, RELEASE mode), the C# compiler will emit the seven-byte method body without the nop opcodes, so it will look like Derived has a compiler-generated constructor, even though it does not.
This is why the method is named Might instead of Is or Has; it indicates that there might be a method that you need to look at, but can't say for sure. In other words, you'll never get a false negative but you still have to investigate if you get a positive result.

There is no way to detect automatically generated default constructors through metadata. You can test this by creating a class library with two classes, one with an explicit default constructor, and one without. Then run ildasm on the assembly: the metadata of the two constructors is identical.
Rather than try to detect generated constructors, I would simply change the program to allow missing documentation on any default constructor. Most documentation generation programs, like NDoc and SandcastleGUI, have an option to add standard documentation to all default constructors; so it's really not necessary to document them at all. If you have an explicit default constructor in your code, you can put three slashes (///) above the constructor - nothing else - to disable the Visual Studio warning about missing documentation.

The following code will return information on any parameterless constructors in your type:
var info = typeof(MyClass).GetConstructor(new Type[] {});
I do not know of a way of differentiating between a default constructor and an explicitly specified parameterless constructor.
A possible workaround for your CommentChecker issue would be to explicitly create the parameterless constructor where one is required and comment it appropriately.

Related

c# custom attribute for a method parameter - how it works?

I would like to understand how this particular case works. Here is the shot from msdn article where INotifyPropertyChanged interface is explained (https://msdn.microsoft.com/query/dev12.query?appId=Dev12IDEF1&l=EN-US&k=k%28System.ComponentModel.INotifyPropertyChanged%29;k%28TargetFrameworkMoniker-.NETFramework,Version%3Dv4.5%29;k%28DevLang-csharp%29&rd=true)
As it's said in marked lines there is a way of intercepting method call to substitute a value instead of what is actual goes as a parameter?
I would like to get an idea of what the code to do this looks like. I know how to work with attributes set for properties and other class members but this use case is not clear for me.
Thanks.
It seems to be a feature implemented in the compiler: it knows about this special attribute and it substitutes the name of the caller into the optional argument when it has its default value.
If you want you can check the Roslyn implementation. Although it is not always very straightforward to navigate there seems to be something here in the GetDefaultParameterValue function (starting at line 844, at least in the current revision as of the time of writing -- 0db946b):
if the optional parameter is annotated with <see cref="CallerLineNumberAttribute"/>, <see cref="CallerFilePathAttribute"/> or <see cref="CallerMemberNameAttribute"/>, and there is no explicit argument corresponding to it, we will provide caller information as a value of this parameter.
At line 912 there is an else if clause that handles this case (the if and else if clauses before that handle the similar new features CallerLineNumberAttribute and CallerFilePathAttribute):
...
else if (parameter.IsCallerMemberName && ((callerSourceLocation = GetCallerLocation(syntax, enableCallerInfo)) != null))
...
which is eventually used to bind the parameter:
BoundExpression memberNameLiteral = MakeLiteral(syntax, ConstantValue.Create(memberName), _compilation.GetSpecialType(SpecialType.System_String));
defaultValue = MakeConversion(memberNameLiteral, parameterType, false);

Interface implementation with optional arguments

Take this interface:
interface ILogger
{
void Store(string payload);
}
And this class implementation of ILogger:
class Logger : ILogger
{
void Store(string payload, bool swallowException = true)
{
...
}
}
I would anticipate the compiler would recognize swallowException as an optional argument, and thus satisfy the requirements of the interface. Instead, what happens is the compiler complains that Logger does not implement interface member Store.
Another interesting thing I tried was implementing the interface explicitly, like so:
class Logger : ILogger
{
void ILogger.Store(string payload, bool swallowException = true)
{
...
}
}
The compiler gives a warning "The default value specified for parameter 'swallowException' will have no effect because it applies to a member that is used in contexts that do not allow optional arguments." It seems to suggest optional arguments are somehow incompatible with explicit interface definitions, but why?
I can work around the problem by overloading Store with two separate function definitions (the way of doing things before optional arguments existed). However I like optional arguments for their syntactic clarity and would prefer that this just worked the way I expect.
I understand there's probably a reasonable (historical or otherwise) explanation for why this is the way it is, but I can't seem to figure it out.
Because optional arguments in C# are just syntactic sugar.
The method definition in your case is
void Store(string payload, bool swallowException)
rather than
void Store(string payload)
Which obviously doesn't match the interface.
The way default arguments work is that the compiler injects the default values into the call of the method. So if you do Store(payload), the compiler will actually emit Store(payload, true). This is extremely important for understanding of default arguments - it's done in compile time of the caller. So if you change the default argument in the callee without recompiling the caller, the caller is still going to use the old default argument.
This also explains the warning you got - since the default value is passed by the compiler explicitly, and you can't call an explicit implementation of an interface without casting to the interface, you're not going to get an opportunity to use the default value, ever.
You don't actually want to use default arguments at all. Simply define two methods like this:
void Store(string payload, bool swallowException)
{
// Do your job
}
void Store(string payload)
{
Store(payload, true);
}
This avoids both of the problems above - the interface contract is satisfied, and the default argument is now part of the callee, not the caller.
Personally, I don't use optional arguments in public API methods at all - they're just aching to cause trouble when you decide that you want to change them at some point. Unless you can make sure they will stay the same forever, don't use them. The same applies to const and enum - both are also determined at compile-time, rather than run-time.
Remember, the reasoning for including default arguments is to allow you to not pass some argument. That makes sense for things like COM API calls (which would otherwise require you to pass all the arguments you don't want to pass as Type.Missing), or null values. Even using false is just asking for trouble when someone decides that a better default would be true - suddenly, some callers are using true and some false, although all think they're using the "default". For a case like yours, I'd use bool? instead, with a default value of null (or default(bool?), whichever you prefer). In the method code itself, you can then easily handle the default at the proper point - say, by doing swallowException.GetValueOrDefault(true).

Why can't I give a default value as optional parameter except null?

I want to have a optional parameter and set it to default value that I determine, when I do this:
private void Process(Foo f = new Foo())
{
}
I'm getting the following error (Foo is a class):
'f' is type of Foo, A default parameter of a reference type other than string can only be initialized with null.
If I change Foo to struct then it works but with only default parameterless constructor.
I read the documentation and it's clearly states that I cannot do this but it doesn't mention why?, Why is this restriction exists and why string is excluded from this? Why the value of an optional parameter has to be compile-time constant? If that wouldn't be a constant then what would be the side-effects ?
A starting point is that the CLR has no support for this. It must be implemented by the compiler. Something you can see from a little test program:
class Program {
static void Main(string[] args) {
Test();
Test(42);
}
static void Test(int value = 42) {
}
}
Which decompiles to:
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 15 (0xf)
.maxstack 8
IL_0000: ldc.i4.s 42
IL_0002: call void Program::Test(int32)
IL_0007: ldc.i4.s 42
IL_0009: call void Program::Test(int32)
IL_000e: ret
} // end of method Program::Main
.method private hidebysig static void Test([opt] int32 'value') cil managed
{
.param [1] = int32(0x0000002A)
// Code size 1 (0x1)
.maxstack 8
IL_0000: ret
} // end of method Program::Test
Note how there is no difference whatsoever between the two call statements after the compiler is done with it. It was the compiler that applied the default value and did so at the call site.
Also note that this still needs to work when the Test() method actually lives in another assembly. Which implies that the default value needs to be encoded in the metadata. Note how the .param directive did this. The CLI spec (Ecma-335) documents it in section II.15.4.1.4
This directive stores in the metadata a constant value associated with method parameter number Int32,
see §II.22.9. While the CLI requires that a value be supplied for the parameter, some tools can use the
presence of this attribute to indicate that the tool rather than the user is intended to supply the value of
the parameter. Unlike CIL instructions, .param uses index 0 to specify the return value of the method,
index 1 to specify the first parameter of the method, index 2 to specify the second parameter of the
method, and so on.
[Note: The CLI attaches no semantic whatsoever to these values—it is entirely up to compilers to
implement any semantic they wish (e.g., so-called default argument values). end note]
The quoted section II.22.9 goes into the detail of what a constant value means. The most relevant part:
Type shall be exactly one of: ELEMENT_TYPE_BOOLEAN, ELEMENT_TYPE_CHAR,
ELEMENT_TYPE_I1, ELEMENT_TYPE_U1, ELEMENT_TYPE_I2, ELEMENT_TYPE_U2,
ELEMENT_TYPE_I4, ELEMENT_TYPE_U4, ELEMENT_TYPE_I8, ELEMENT_TYPE_U8,
ELEMENT_TYPE_R4, ELEMENT_TYPE_R8, or ELEMENT_TYPE_STRING; or
ELEMENT_TYPE_CLASS with a Value of zero
So that's where the buck stops, no good way to even reference an anonymous helper method so some kind of code hoisting trick cannot work either.
Notable is that it just isn't a problem, you can always implement an arbitrary default value for an argument of a reference type. For example:
private void Process(Foo f = null)
{
if (f == null) f = new Foo();
}
Which is quite reasonable. And the kind of code you want in the method instead of the call site.
Because there's no other compile-time constant than null. For strings, string literals are such compile-time constants.
I think that some of the design decisions behind it may have been:
Simplicity of implementation
Elimination of hidden / unexpected behavior
Clarity of method contract, esp. in cross-assembly scenarios
Lets elaborate on these three a bit more to get some insight under the hood of the problem:
1. Simplicity of implementation
When limited to constant values, both the compiler's and CLR's jobs are pretty easy. Constant values can be easily stored in assembly metadata, and the compiler can easily . How this is done was outlined in Hans Passant's answer.
But what could the CLR and compiler do to implement non-constant default values? There are two options:
Store the initialization expressions themselves, and compile them there:
// seen by the developer in the source code
Process();
// actually done by the compiler
Process(new Foo());
Generate thunks:
// seen by the developer in the source code
Process();
…
void Process(Foo arg = new Foo())
{
…
}
// actually done by the compiler
Process_Thunk();
…
void Process_Thunk()
{
Process(new Foo());
}
void Process()
{
…
}
Both solutions introduce a lot more new metadata into assemblies and require complex handling by the compiler. Also, while solution (2) can be seen as a hidden technicality (as well as (1)), it has consequences in respect to the perceived behavior. The developer expects that arguments are evaluated at call site, not somewhere else. This may impose extra problems to be solved (see part related to method contract).
2. Elimination of hidden / unexpected behavior
The initialization expression could have been arbitrarily complex. Hence a simple call like this:
Process();
would unroll into a complex calculation performed at call site. For example:
Process(new Foo(HorriblyComplexCalculation(SomeStaticVar) * Math.Power(GetCoefficient, 17)));
That can be rather unexpected from the point of view of reader that does not inspect ´Process´'s declaration thoroughly. It clutters the code, makes it less readable.
3. Clarity of method contract, esp. in cross-assembly scenarios
The signature of a method together with default values imposes a contract. This contract lives in a particular context. If the initialization expression required bindings to some other assemblies, what would that require from the caller? How about this example, where the method 'CalculateInput' is from 'Other.Assembly':
void Process(Foo arg = new Foo(Other.Assembly.Namespace.CalculateInput()))
Here's the point where the way this would be implemented plays critical role in thinking whether this is a problem or note. In the “simplicity” section I've outlined implementation methods (1) and (2). So if (1) were chosen, it would require the caller to bind to 'Other.Assembly'. On the other hand, if (2) were chosen, there's far less a need—from the implemenetation point of view—for such rule, because the compiler-generated Process_Thunk is declared at the same place as Process and hence naturally has a reference to Other.Aseembly. However, a sane language designer would even though impose such a rule, because multiple implementations of the same thing are possible, and for the sake of stability and clarity of method contract.
Nevertheless, there cross-assembly scenarios would impose assembly references that are not clearly seen from the plain source code at call site. And that's a usability and readability problem, again.
It is just the way the language works, I can't say why they do it (and this site is not a site for discussions like that, if you want to discuss it take it to chat).
I can show you how to work around it, just make two methods and overload it (modified your example slightly to show how you would return results too).
private Bar Process()
{
return Process(new Foo());
}
private Bar Process(Foo f)
{
//Whatever.
}
Default parameters manipulate the caller in a way that wheb you supply a default parameter, it will change your methods signature at compile time. Because of that you need to supply a Constant Value, which in your case "new Foo()" is not.
That is why you need a constant.

Why does a recursive constructor call make invalid C# code compile?

After watching webinar Jon Skeet Inspects ReSharper, I've started to play a little with
recursive constructor calls and found, that the following code is valid C# code (by valid I mean it compiles).
class Foo
{
int a = null;
int b = AppDomain.CurrentDomain;
int c = "string to int";
int d = NonExistingMethod();
int e = Invalid<Method>Name<<Indeeed();
Foo() :this(0) { }
Foo(int v) :this() { }
}
As we all probably know, field initialization is moved into constructor by the compiler. So if you have a field like int a = 42;, you will have a = 42 in all constructors. But if you have constructor calling another constructor, you will have initialization code only in called one.
For example if you have constructor with parameters calling default constructor, you will have assignment a = 42 only in the default constructor.
To illustrate second case, next code:
class Foo
{
int a = 42;
Foo() :this(60) { }
Foo(int v) { }
}
Compiles into:
internal class Foo
{
private int a;
private Foo()
{
this.ctor(60);
}
private Foo(int v)
{
this.a = 42;
base.ctor();
}
}
So the main issue, is that my code, given at the start of this question, is compiled into:
internal class Foo
{
private int a;
private int b;
private int c;
private int d;
private int e;
private Foo()
{
this.ctor(0);
}
private Foo(int v)
{
this.ctor();
}
}
As you can see, the compiler can't decide where to put field initialization and, as result, doesn't put it anywhere. Also note, there are no base constructor calls. Of course, no objects can be created, and you will always end up with StackOverflowException if you will try to create an instance of Foo.
I have two questions:
Why does compiler allow recursive constructor calls at all?
Why we observe such behavior of the compiler for fields, initialized within such class?
Some notes: ReSharper warns you with Possible cyclic constructor calls. Moreover, in Java such constructor calls won't event compile, so the Java compiler is more restrictive in this scenario (Jon mentioned this information at the webinar).
This makes these questions more interesting, because with all respect to Java community, the C# compiler is at least more modern.
This was compiled using C# 4.0 and C# 5.0 compilers and decompiled using dotPeek.
Interesting find.
It appears that there are really only two kinds of instance constructors:
An instance constructor which chains another instance constructor of the same type, with the : this( ...) syntax.
An instance constructor which chains an instance constructor of the base class. This includes instance constructors where no chainig is specified, since : base() is the default.
(I disregarded the instance constructor of System.Object which is a special case. System.Object has no base class! But System.Object has no fields either.)
The instance field initializers that might be present in the class, need to be copied into the beginning of the body of all instance constructors of type 2. above, whereas no instance constructors of type 1. need the field assignment code.
So apparently there's no need for the C# compiler to do an analysis of the constructors of type 1. to see if there are cycles or not.
Now your example gives a situation where all instance constructors are of type 1.. In that situation the field initaializer code does not need to be put anywhere. So it is not analyzed very deeply, it seems.
It turns out that when all instance constructors are of type 1., you can even derive from a base class that has no accessible constructor. The base class must be non-sealed, though. For example if you write a class with only private instance constructors, people can still derive from your class if they make all instance constructors in the derived class be of type 1. above. However, an new object creation expression will never finish, of course. To create instances of the derived class, one would have to "cheat" and use stuff like the System.Runtime.Serialization.FormatterServices.GetUninitializedObject method.
Another example: The System.Globalization.TextInfo class has only an internal instance constructor. But you can still derive from this class in an assembly other than mscorlib.dll with this technique.
Finally, regarding the
Invalid<Method>Name<<Indeeed()
syntax. According to the C# rules, this is to be read as
(Invalid < Method) > (Name << Indeeed())
because the left-shift operator << has higher precedence than both the less-than operator < and the greater-than operator >. The latter two operarors have the same precedence, and are therefore evaluated by the left-associative rule. If the types were
MySpecialType Invalid;
int Method;
int Name;
int Indeed() { ... }
and if the MySpecialType introduced an (MySpecialType, int) overload of the operator <, then the expression
Invalid < Method > Name << Indeeed()
would be legal and meaningful.
In my opinion, it would be better if the compiler issued a warning in this scenario. For example, it could say unreachable code detected and point to the line and column number of the field initializer that is never translated into IL.
I think because the language specification only rules out directly invoking the same constructor that is being defined.
From 10.11.1:
All instance constructors (except those for class object) implicitly include an invocation of another instance constructor immediately before the constructor-body. The constructor to implicitly invoke is determined by the constructor-initializer
...
An instance constructor initializer of the form this(argument-listopt) causes an instance constructor from the class itself to be invoked ... If an instance constructor declaration includes a constructor initializer that invokes the constructor itself, a compile-time error occurs
That last sentence seems to only preclude direct calling itself as producing a compile time error, e.g.
Foo() : this() {}
is illegal.
I admit though - I can't see a specific reason for allowing it. Of course, at the IL level such constructs are allowed because different instance constructors could be selected at runtime, I believe - so you could have recursion provided it terminates.
I think the other reason it doesn't flag or warn on this is because it has no need to detect this situation. Imagine chasing through hundreds of different constructors, just to see if a cycle does exist - when any attempted usage will quickly (as we know) blow up at runtime, for a fairly edge case.
When it's doing code generation for each constructor, all it considers is constructor-initializer, the field initializers, and the body of the constructor - it doesn't consider any other code:
If constructor-initializer is an instance constructor for the class itself, it doesn't emit the field initializers - it emits the constructor-initializer call and then the body.
If constructor-initializer is an instance constructor for the direct base class, it emits the field initializers, then the constructor-initializer call, and then then body.
In neither case does it need to go looking elsewhere - so it's not a case of it being "unable" to decide where to place the field initializers - it's just following some simple rules that only consider the current constructor.
Your example
class Foo
{
int a = 42;
Foo() :this(60) { }
Foo(int v) { }
}
will work fine, in the sense that you can instantiate that Foo object without problems. However, the following would be more like the code that you're asking about
class Foo
{
int a = 42;
Foo() :this(60) { }
Foo(int v) : this() { }
}
Both that and your code will create a stackoverflow (!), because the recursion never bottoms out. So your code is ignored because it never gets to execute.
In other words, the compiler can't decide where to put the faulty code because it can tell that the recursion never bottoms out. I think this is because it has to put it where it will only be called once, but the recursive nature of the constructors makes that impossible.
Recursion in the sense of a constructor creating instances of itself within the body of the constructor makes sense to me, because e.g. that could be used to instantiate trees where each node points to other nodes. But recursion via the pre-constructors of the sort illustrated by this question can't ever bottom out, so it would make sense for me if that was disallowed.
I think this is allowed because you can (could) still catch the Exception and do something meaningfull with it.
The initialisation will never be run, and it will almost certaintly throw a StackOverflowException. But this can still be wanted behaviour, and didn't always mean the process should crash.
As explained here https://stackoverflow.com/a/1599236/869482

DefaultMemberAttribute - what does it do?

I've already read the MSDN article about it. It seems internally it is the way c# sets which is the function that is going to work as indexer(am I right?). Now, I've seen the following example:
[DefaultMemberAttribute("Main")]
public class Program {
public static void Main() {
...
}
}
Now, I don't get it what it means.
Thanks all. But I still can't get its usefulness, apart from the indexer thing. When are we going to call InvokeMember?
No, the DefaultMemberAttribute is used by languages such as VB.NET to find out the member that is acted on by default if no member is referenced from an object, i.e. the member invoked by InvokeMember. This is often used in conjunction with indexers, as you noted, but it is not used by C# directly (unless you use InvokeMember explicitly).
However, for the benefit of other .NET languages, C# does emit the DefaultMemberAttribute for the indexer of a class (if it has one), as indicated by MSDN:
The C# compiler emits the
DefaultMemberAttribute on any type
containing an indexer. In C# it is an
error to manually attribute a type
with the DefaultMemberAttribute if the
type also declares an indexer.
I think MSDN confuses things by referring to indexers a lot in the remarks but then giving an example that does not use an indexer. To clarify, the default member can be anything, but C# gives special behavior for indexers by emitting the attribute for you (if an indexer exists) to the exception of all other use cases.
I personally have never used it, but as far as I can tell you are defining the default method to be invoked when calling InvokeMember. So, using the code snippet you provided if I was to say:
Program prog = new Program();
typeof(Program).InvokeMember("", null, null, prog, null);
Because I left the first argument empty of the InvokeMember call it would use the attribute to determine what the default member is of your class, in your case it is Main.
The DefaultMemberAttribute attribute defines the default member to be called on a when InvokeMember is called with an empty string as the first argument.
If you read the MSDN docs for InvokeMember, it explicitly says:
Parameters
name
Type: System.String
The String containing the name of the constructor, method, property, or field member to invoke.
-or-
An empty string ("") to invoke the default member.
The default member will be the one declared by the DefaultMemberAttribute attribute.

Categories