DotNet Reflector - why can't I disassemble XmlHierarchicalEnumerable?

DotNet Reflector - why can't I disassemble XmlHierarchicalEnumerable? - c#

Note that the following are examples of the rare cases where dotNet reflector does not disassemble correctly. In the vast majority of cases it works perfectly, and I am not suggesting this is necessarily a bug in reflector. It may be a result of protection or obfuscation or unmanaged code on the assemblies in question.
I try to disassemble System.Web.UI.WebControls.XmlHierarchicalEnumerable in dotnet reflector. The generics seems all screwed up, eg:
// Nested Types
[CompilerGenerated]
private sealed class GetEnumerator>d__0 : IEnumerator<object>,
IEnumerator, IDisposable
{
// Fields
private int <>1__state;
private object <>2__current;
public XmlHierarchicalEnumerable <>4__this;
public IEnumerator <>7__wrap2;
public IDisposable <>7__wrap3;
public XmlNode <node>5__1;
In other assemblies I sometimes get little squares (I know these usually stand for 'unknown symbol') in place of class names, eg:
dictionary1.Add("autopostbackonselect", 0x34);
ᜀ.ᜌ = dictionary1;
}
if (ᜀ.ᜌ.TryGetValue(key, out num))
{
switch (num)
What gives ? Anyone know ?

In the first example, this is completely expected. These classes are used for the implementation of IEnumerable<T> when using the yield return statements. It generates classes which store the state and get the new values when MoveNext is called on the IEnumerator<T> instance output by the IEnumerable<T>.GetEnumerator implementation (you will note they are one in the same).
It should be noted that what you are seeing is completely legal naming syntax from a CLR perspective. From a C# perspective though, it is not legal. However, since these classes are internal and you will never need access to them directly (only through interface implmentations), there is no need for them to be legal C# names.
As for the second, I haven't seen that behavior, it's possible that the assembly is obfuscated, but I've not seen that in .NET in any version. If you clarify the assemblies (.NET framework or not) in which version of the .NET framework you are looking at, as well as what version of reflector you are using, it would help.

I have seen this before when looking at assemblies that have been Obfuscated. Quite often during this process variable names are unreadable to the human eye, thus resulting in a unknown character.

This assembly might have been Obfuscated, you can check these links http://cooprotector.com/ http://intelliside.com/

There's quite a few things being autogenetated by the compiler. Auto properties, anonymous types/methods and emumerators based on enumerator blocks. They All need a name and that should be one that does not clash with something names by the developer. Since <>_ is a perfectly legal name in CLR terms but not in C# prefixing anything autogenetated and named with <>_ ensures that the compiler wont accidentally choose a name already used by the developer.

Related

Why doesn't a local variable of type CancellationToken need initialization? [duplicate]

I notice that the following will compile and execute even though the local variables are not initialized. Is this a feature of Span?
void Uninitialized()
{
Span<char> s1;
var l1 = s1.Length;
Span<char> s2;
UninitializedOut(out s2);
var l2 = s2.Length;
}
void UninitializedOut(out Span<char> s)
{}

This looks like an issue caused by reference assemblies, required because of the way that Span<T> has framework-specific internals.
This means that in the reference assembly: there are no fields (edit: this isn't quite true - see footnote).
A struct is considered assigned (for the purposes of "definite assignment") if all fields are assigned, and in this case the compiler is seeing "all zero of zero fields have been assigned: all good - this variable is assigned". But the compiler doesn't seem to know about the actual fields, so it is being misled into allowing something that is not technically valid.
You definitely shouldn't rely on this behaving nicely! Although in most cases .locals init should mean you don't actually get anything too horrible. However, there is currently some work in progress to allow people to suppress .locals init in some cases - I dread to think what could happen in that scenario here - especially since Span<T> works much like a ref T - that could get very very dangerous if the field really isn't initialized to zero.
Interestingly, it might already be fixed: see this example on sharplab. Alternatively, maybe sharplab is using a concrete target framework, rather than reference assemblies.
Edit: very oddly, if I load the reference assembly into ildasm or reflector, I can see:
.field private initonly object _dummy
which is the spoofed field in the reference assembly that is meant to stop this from happening, but... it looks like it isn't working very reliably right now!
Update: apparently the difference here is a subtle but known compiler issue that remains for compatibility reasons; definite assignment of structs considers private fields of types that are known locally, but does not consider private reference-type fields of types in external assemblies.

Marc has a great answer. I wanted to elaborate a bit on the history / context.
First of all this is definitely a compiler bug. By rules of definite assignment this local is not definitely assigned and any usage should be an error. Unfortunately this bug is hard to fix for number of reasons:
This bug is old and goes back to at least C# 4.0. That gives customers 7+ years to inadvertently take a dependency on it
There are a number of structs in the BCL which have this basic structure. For example CancellationToken.
Those taken together mean fixing this would likely break a large amount of existing code. Despite this the C# team attempted to fix the bug in C# 6.0 when the bug was much younger. But an attempt at compiling the Visual Studio source with this fix showed that the fears around customers taking a dependency on this bug were well founded: there were a number of build breaks. Enough to convince us it would have a negative impact on a significant amount of code. Hence the fix was undone.
The second problem here is this bug wasn't known to all the compiler team members (before today at least). Been ~3 years since the fix was undone and had a bit of turn over since then. The team members who verified how we were generating reference assemblies for Span<T> weren't aware of this bug and recommended the current design based on the language spec. I'm one of those developers :(
Still discussing this but most likely we're going to update the reference assembly strategy for Span<T>, and other types, so that it avoids this compiler bug.
Thanks for reporting this. Sorry about the confusion caused :(

More or less this is by design, since it depends heavily if the underlying struct holds any fields itself.
This code compiles for example:
public struct MySpan<T>
{
public int Length => 1;
}
static class Program
{
static void Main(string[] args)
{
MySpan<char> s1;
var l1 = s1.Length;
}
}
But this code doesn't:
public struct MySpan<T>
{
public int Length { get; }
}
static class Program
{
static void Main(string[] args)
{
MySpan<char> s1;
var l1 = s1.Length;
}
}
It seems that in that case, the struct is defaulted, and that is why it doesn't complain about a missing assignment. That it doesn't detect any fields is a bug, as explained in Marc's answer.

Why can't Mono support generic interface instantiation with AOT?

The Mono documentation has a code example about full AOT not supporting generic interface instantiation:
interface IFoo<T> {
...
void SomeMethod();
}
It says:
Since Mono has no way of determining from the static analysis what method will implement the interface IFoo<int>.SomeMethod this particular pattern is not supported."
So I think the compiler can't work with this method under type inference. But I still can't understand the underlying reason about the full AOT limitation.
There is still a similar problem with the Unity AOT script restrictions. In the following code:
using UnityEngine;
using System;
public class AOTProblemExample : MonoBehaviour, IReceiver
{
public enum AnyEnum {
Zero,
One,
}
void Start() {
// Subtle trigger: The type of manager *must* be
// IManager, not Manager, to trigger the AOT problem.
IManager manager = new Manager();
manager.SendMessage(this, AnyEnum.Zero);
}
public void OnMessage<T>(T value) {
Debug.LogFormat("Message value: {0}", value);
}
}
public class Manager : IManager {
public void SendMessage<T>(IReceiver target, T value) {
target.OnMessage(value);
}
}
public interface IReceiver {
void OnMessage<T>(T value);
}
public interface IManager {
void SendMessage<T>(IReceiver target, T value);
}
I am confused by this:
The AOT compiler does not realize that it should generate code for the generic method OnMessage with a T of AnyEnum, so it blissfully continues, skipping this method. When that method is called, and the runtime can’t find the proper code to execute, it gives up with this error message.
Why does the AOT not know the type when the JIT can infer the type? Can anyone offer a detailed answer?

Before describing the issues, consider this excerpt from another answer of mine that describes the generics situation on platforms that do support dynamic code generation:
In C# generics, the generic type definition is maintained in memory at runtime. Whenever a new concrete type is required, the runtime environment combines the generic type definition and the type arguments and creates the new type (reification). So we get a new type for each combination of the type arguments, at runtime.
The phrase at runtime is key to this, because it brings us to another point:
This implementation technique depends heavily on runtime support and JIT-compilation (which is why you often hear that C# generics have some limitations on platforms like iOS, where dynamic code generation is restricted).
So is it possible for a full AOT compiler to do that as well? It most certainly is possible. But is it easy?
There is a paper from Microsoft Research on pre-compiling .NET generics that describes the interaction of generics with AOT compilation, highlights some potential problems and proposes solutions. In this answer, I will use that paper to try to demonstrate why .NET generics aren't widely pre-compiled (yet).
Everything must be instantiated
Consider your example :
IManager manager = new Manager();
manager.SendMessage(this, AnyEnum.Zero);
Clearly we're calling the method IManager.SendMessage<AnyEnum> here, so the fully AOT compiler needs to compile that method.
But this is an interface call, which is effectively a virtual call, which means the we can't know ahead of time which implementation of the interface method will be called.
The JIT compiler doesn't care about this problem. When someone attempts to run a method that hasn't been compiled yet, the JIT will be notified and it will compile the method lazily.
On the contrary, a fully AOT compiler doesn't have access to all this runtime type information. So it has to pessimistically compile all possible instantiations of the generic method on all implementations of the interface. (Or just give up and not offer that feature.)
Generics can be infinitely recursive
object M<T>(long n)
{
if (n == 1)
{
return new T[]();
}
else
{
return M<T[]>(n - 1);
}
}
To instantiate M<int>(), the compiler needs to instantiate int[] and M<int[]>(). To instantiate M<int[]>(), the compiler needs to instantiate int[][] and M<int[][]>(). To instantiate M<int[][]>(), the compiler needs to instantiate int[][][] and M<int[][][]>().
This can be solved by using representative instantiations (just like the JIT compiler uses). This means that all generic arguments that are reference types can share their code. So:
int[][], int[][][], int[][][][] (and so on) can all share the same code, because they are arrays of references.
M<int[]>, M<int[][]>, M<int[][][]> (and so on) can all share the same code, because they operate on references.
Assemblies need to own their generics...
Since C# programs are compiled in assemblies, it's hard to tell exactly who should "own" which instantiation of each type.
Assembly1 declares the type G<T>.
Assembly2 (references Assembly1) instantiates the type G<int>.
Assembly3 (references Assembly1) instantiates the type G<int> as well.
AssemblyX (references all the above) wants to use G<int>.
Which assembly gets to compile the actual G<int>? If they happen to be standalone libraries, neither Assembly2 nor Assembly3 can be compiled without each owning a copy of G<int>. So we're already looking at duplicated native code.
...and those generics must still be compatible with each other
But then, when AssemblyX is compiled, which copy of G<int> should it use? Clearly, it has to be able to handle both, because it may need to receive a G<int> from or send a G<int> to either assembly.
But more importantly, in C# you can't have two types with identical fully qualified names that turn out to be incompatible. In other words:
G<int> obj = new G<int>();
The above can never fail on the grounds that G<int> (the variable's type) is the G<int> from Assembly2 while G<int> (the constructor's type) is the G<int> from Assembly3. If it fails for a reason like that, we're not in C# anymore!
So both types need to exist and they need to be made transparently compatible, even though they are compiled separately. For this to happen, the type handles need to be manipulated at link time in such a way that the semantics of the language are retained, including the fact that they should be assignable to each other, their type handles should compare as equal (e.g. when using typeof), and so on.

Unity is using an old version of Mono's Full-AOT the does not support generic interface method.
This is due to how generics are represented in the JIT vs in native code. (I would like to elaborate, but frankly, I do not trust myself to be accurate)
Newer versions of Mono's AOT compiler address this issue (of course, with other limitations), but Unity keeps an old version of Mono. (I think I remember hearing that they changed their approach from AOT to something a bit else, but I'm not sure how it works anymore).
I don't fully understand the topic WARNING
The way "generics" are handled in C++ (for example), which compiles to assembly, binary, is using a language mechanism called templates. These templates are more like glorified macros, and different code is actually generated for each type used. (Edit: Actually there are more differences between C# generics and C++ templates, but for the purpose of this answer I'll treat them as equivalent).
For example; for the following code:
template<typename T>
class Foo
{
public:
T GetValue() { return value; }
void SetValue(T a) {value = a;}
private:
T value;
};
int main()
{
Foo<int> a;
Foo<char *> b;
a.SetValue(0);
b.SetValue((char*)0);
a.GetValue();
b.GetValue();
return 0;
}
The following functions will be generated (got this by running nm --demangle)
00000000004005e4 W Foo<int>::GetValue()
00000000004005b2 W Foo<int>::SetValue(int)
00000000004005f4 W Foo<char*>::GetValue()
00000000004005ca W Foo<char*>::SetValue(char*)
This means that for every type you use this class with, another of instance of practically the same code will be generated (although I'm sure that GCC is smart enough to optimize some of the obvious cases, like getters and setters, and maybe more).
C#'s generics are a bit more complex.
Here's a very interesting article by Eric Lippert. The summary is that compiled C# generic code has only one instance that is, well, generic, and what that depends on the type is calculated at runtime.
When translating C# code to native/machine code (which is essentially what AOT does), there's a problem translating generics.
This is where the subject gets a bit fuzzy to me. I can only assume that AOT'd code does not retain runtime type information, so it needs code-per-type for generic cases.
When receiving an object of type IFooable, it is possible that the native virtual table format is not verbose enough to enable finding the correct implementation; although I admit I have no idea why would that be, or the exact details of the virtual table of AOT'd code (is it identical to that of C++'s?)

C# Conditional Compilation Variables Based on OS Version

First, yes, I have seen these posts:
Is there an easy way in C# to have conditional compilation symbols based on OS version
Conditional compilation depending on the framework version in C#
but they do not target the direction I am looking for.
What I am looking for specifically is variable _type_ qualification via OS version:
public struct REPASTESPECIAL
{
UInt32 dwAspect;
#if WINVER >= 6.0
UIntPtr dwParam;
#else
UInt32 dwParam;
#endif
}
I do not wish to revert to something like this:
public struct REPASTESPECIAL<T>
{
UInt32 dwAspect;
T dwParam;
}
// return type by OS version
public static Type GetRePasteSpecial_ParamType()
{
if (Environment.OSVersion.Version.Major >= 5) return typeof(IntPtr);
else return typeof(UInt32);
}
...as that would permit programmers to use any object of Type T when I desire dwParam to be an IntPtr or an UInt32 object only, but if I must then so be it, and make this a reference for others looking for the same.

No - conditional compilation works at compile time, whereas it looks like you're after something which works at execution time.
EDIT: Just thinking about it, you could use conditional compilation in one sense: you could encapsulate this as far as possible in its own assembly, and compile it two ways, producing two separate assemblies. Then at install time (or whatever) install the right assembly. It's hard to know how feasible that is without knowing what kind of app you're writing though.

Answer to: "I have some ugly logic that I'm afraid will be misused when I expose it to others"
Consider exposing nice and usable API which does not let any misuse. Declare all interop types as inner classes and structures of implementation f you nice API, there is not much value to make innermost classes like REPASTESPECIAL to be publicly visible/usable.
This way you can hide ugly class/struct types and dynamically pick OS specific implementation if needed.
If this educational project - than it would be good place to learn about dependency injection to configure correct implementation at run-time.

Decompiling VB.Net assembly produces code with invalid member variable names; names starting with $STATIC$

I am doing work for a client who has lost the source code for one of their VB.Net WinForms applications. The assembly they have is not obfuscated at all. I am trying to recover as much of the source as I can as C# source and have tried several tools for decompiling assemblies, including Reflector, ILSpy and JustDecompile (all the latest versions), but they all produce code with a huge number of errors in them. Because of the large number of errors in the generated code, I am going to ask about the specific errors (in different questions), hopefully to get more directed answers and in this way try shed some light on why all the tools are having difficulty decompiling this assembly.
This question pertains to the fact that the code generated by all these tools always have a large number of invalid member variables (fields) such as the following:
private short $STATIC$Report_Print$20211C1280B1$nHeight;
private ArrayList $STATIC$Report_Print$20211C1280B1$oColumnLefts;
private StaticLocalInitFlag $STATIC$Report_Print$20211C1280B1$oColumnLefts$Init;
Can someone explain why the generated code has these invalid member variables and how I can resolve these?

Those are identifiers generated by the VB.NET compiler to implement the Static keyword. For example:
Class Example
Public Sub test()
Static lookhere As Integer = 42
End Sub
End Class
generates this IL:
.field private specialname int32 $STATIC$test$2001$lookhere
.field private specialname class [Microsoft.VisualBasic]Microsoft.VisualBasic.CompilerServices.StaticLocalInitFlag $STATIC$test$2001$lookhere$Init
By using reserved letters in the field name, the compiler can be sure there will never be an accidental collision with another field. There's no direct equivalent to Static in the C# language. You can leave them as private fields in the class but you have to watch out for initialization. The purpose of the $Init flag and rather a lot of IL that ensures the variable is correctly initialized. You'll need to rename them by hand.

In short, what's valid in IL isn't necessarily the same as what's valid in the source language. It's fairly common to give compiler-generated (aka synthetic in some circles) members name which are invalid in the language, as it avoids any possible clashes. These are sometimes called unspeakable names as they can't be "spoken" in the source language. For example, the C# compiler usually includes <> in such names.
As for resolving the issue - some decompilers will work out where such names have come from automatically, but you can usually simply change the name everywhere. You won't end up with the original source code, but if you look at what you do end up with, you may be able to then work out more easily what the original source did look like.
Note that the compiler may generate more than just invalid names: in C#, for example, iterators blocks generate IL which in some cases can't be expressed directly in "normal" C# itself. This may not be a problem for you, but it's worth being aware of.

Those aren't variables, they are fields (they have access modifiers).
They will be compiler generated fields which will be generated in a number of different circumstances. The names are purposely invalid to avoid conflicts with "normal" fields.
If you can provide a little more context someone clever can probably figure out what the source originally looked like for the compiler to emit those fields.

What problems does reflection solve?

I went through all the posts on reflection but couldn't find the answer to my question.
What were the problems in the programming world before .NET reflection
came and how it solved those problems?
Please explain with an example.

It should be stated that .NET reflection isn't revolutionary - the concepts have been around in other framework.
Reflection in .NET has 2 facets:
Investigating type information
Without some kind of reflection / introspection API, it becomes very hard to perform things like serialization. Rather than having this provided at runtime (by inspecting the properties/fields/etc), you often need code-generation instead, i.e. code that explicitly knows how to serialize each of your types. Tedious, and painful if you want to serialize something that doesn't have a twin.
Likewise, there is nowhere to store additional metadata about properties etc, so you end up having lots of additional code, or external configuration files. Something as simple as being able to associate a friendly name with a property (via an attribute) is a huge win for UI code.
Metaprogramming
.NET reflection also provides a mechanism to create types (etc) at runtime, which is hugely powerful for some specific scenarios; the alternatives are:
essentially running a parser/logic tree at runtime (rather than compiling the logic at runtime into executable code) - much slower
yet more code generation - yay!

I think to understand the need for reflection in .NET, we need to go back to before .NET. After all, modern languages like like Java and C# do not have a history BF (before reflection).
C++ arguably has had the most influence on C# and Java. But C++ did not originally have reflection and we coded without it and we managed to get by. Occasionally we had void pointer and would use a cast to force it into whatever type we wanted. The problem here was that the cast could fail with terrible consequences:
double CalculateSize(void* rectangle) {
return ((Rect*)rectangle)->getWidth() * ((Rect*)rectangle)->getHeight());
}
Now there are plenty of arguments why you shouldn't have coded yourself into this problem in the first place. But the problem is not much different from .NET 1.1 with C# when we didn't have generics:
Hashtable shapes = new Hashtable();
....
double CalculateSize(object shape) {
return ((Rect)shape).Width * ((Rect)shape).Height;
}
However, when the C# example fails it does so with a exception rather than a potential core dump.
When reflection was added to C++ (known as Run Time Type Identification or RTTI), it was hotly debated. In Stroustrup's book The Design and Evolution of C++, he lists the following
arguments against RTTI, in that some people:
Declared the support unnecessary
Declared the new style inherently evil ("against the spirit of C++")
Deemed it too expensive
Thought it too complicated and confusing
Saw it as the beginning of an avalanche of new features
But it did allow us to query the type of objects, or features of objects. For example (using C#)
Hashtable shapes = new Hashtable();
....
double CalculateSize(object shape) {
if(shape is Rect) {
return ((Rect)shape).Width * ((Rect)shape).Height;
}
else if(shape is Circle) {
return Math.Power(((Circle)shape).Radius, 2.0) * Math.PI;
}
}
Of course, with proper planning this example should never need to occur.
So, real world situations where I've needed it include:
Accessing objects from shared memory, all I have is a pointer and I need to decide what to do with it.
Dynamically loading assemblies, think about NUnit where it loads every assembly and uses reflection to determine which classes are test fixtures.
Having a mixed bag of objects in a Hashtable and wanting to process them differently in an enumerator.
Many others...
So, I would go as far as to argue that Reflection has not enabled the ability to do something that couldn't be done before. However, it does make some types of problems easier to code, clearer to reader, shorter to write, etc.
Of course that's just my opinion, I could be wrong.

I once wanted to have unit tests in a text file that could be modified by a non-technical user in the format in C++:
MyObj Function args //textfile.txt
But I couldn't find a way to read in a string and then have the code create an object instance of the type represented by the string without reflection which C++ doesn't support.
char *str; //read in some type from a text file say the string is "MyObj"
str *obj; //cast a pointer as type MyObj
obj = new str; //create a MyObj
Another use might be to have a generic copy function that could copy the members of an class without knowing them in advance.

It helps a lot when you are using C# attributes like [Obsolete] or [Serializable] in your code. Frameworks like NUnit use reflection on classes and containing methods to understand which methods are tests, setup, teardown, etc.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.