Can you perform code indirection in .NET? - c#

I remember with InterSystems Cache code, you can use indirection to take a string and turn that into real executable code by preceding the string variable with "#". Can this be done in C#.NET or VB.NET code? So I'd like to have a method that would take an arguments array of strings (with one or multiple lines of code), and run that code, assuming it doesn't throw an exception of course. Where am I going with this? I'm trying to write a compiler within .NET code.
SET x="set a=3" XECUTE x ; sets the public variable a to 3
OR
SET x="tag1" d #x ; do/call the public subroutine tag1
OR
Set Y = "B",#Y = 6 ; sets public variable B = 6

I assume that you want to compile during runtime.
System.CodeDom and System.CodeDom.Complier namespaces contain interfaces that are relevant to runtime compilation.
For your own language you need to implement your derived class from a derived class of CodeDomProvider.

For .NET you can either programmatically build up code using System.CodeDom which is basically a wrapper over the Intermediate Language, or you can use System.CodeDom.Compiler to get an object that compiles a string (or file) into an executable or DLL using a C# or VB.NET compiler.
Compiling the string is more like the Intersystems Cache way of doing it, but it's still more work, because you must provide all the information the compiler needs. If you look at the CompilerParameters class you will see the added complexity. The compiled code will be in it's own assembly. An assembly can't be unloaded unless it's in it's own App Domain, and when dynamically compiling it is difficult enough that most people don't bother if they can avoid it.
Various approaches to your problem are proposed on this very site.
Some source code for one solution to what you've described can be found here if the link stays alive.

Related

Decompile C# DLL without method bodies

Is there and easy way to decompile a C# DLL (like ILSpy does, for example), but instead of method bodies, have the methods return default values (or throw runtime exceptions for that matter)?
Why do I need this? I am trying to replace some classes in a .dll library. I can't decompile the entire library, since it contains many lambda function and iterators that cause problems when decompiled.
I had, however, success doing this: Copy the decompiled source of the class I want to change, paste it into a new project and include the original .dll as a library. Then, change the class to my liking (change methods implementation for now), and then compile the project and inject the compiled IL code into the original .dll via a disassembler.
This has worked out so far with a success, however now I have run into a problem. The class (let's say A) that I'm trying to change now, passes this as an argument to other classes (let's say B), and it's generating a compile error (since the original B class expects the original A class, and not the "fake" A class that I'm editing).
This, of course, would not be a problem if I had the complete source code to the .dll library, but I don't. Fortunately, I don't have to. If I had a "structure-only" source code (declarations of classes, fields, methods, interfaces and what not, but no method implemetations), I could copy-paste the source of class A into this sructure-only sorce code, make the changes I need, compile, and then inject the IL code as before.
So, can I easily get the sructure-only sorce code (again, method implementation can be eighter returning default value, or throwing runtime exception), or is there a better to replace a class in a .dll library like that?

How can I get a list of every assembly, namespace, and class resolved through reflection in my application?

I need to use a iOS build setting in Unity3d that strips unused classes from bytecode but as it uses static analysis to discover which to remove- so any classes resolved through reflection will not be excluded from removal unless explicitly added to an exclusion list. I managed to remove all uses of reflection in my own code, but Mono itself seems to use a reflection based configuration to do a bunch of stuff and I've already added about a dozen classes to the exclusion list but now I'm to the point where exceptions are not giving any clues as to what class needs to be excluded for them to work.
My question is, is it possible to get a precise list of all the classes (with source assembly and namespace) resolved through reflection throughout every assembly that the application uses, and how would you go about it? I have Visual Studio 2012 and while I know it has powerful debugging tools I don't know how I would use them to this end.
Thanks.
The short version
You can't as there is no way to find all lookups via reflection using static analysis.
The long version
Just think of the following example: I write code that selects a class depending on user input, e.g. in pseudo code:
string action = ... ; // get some user input here, e.g. "Fire"
string clazz = "Do" + action;
var obj = Activator.CreateInstance("MyActions", clazz);
As you can see the actual full class name is not occuring anywhere in the code. So you would need to execute the code in every possible way to find out which values the clazz variable could assume. Therefore you cannot find out which classes this code would access via reflection.
Further Questions
What exact API from Mono are you using and what kind of exceptions are you getting? Maybe there is some alternative that could be used for your purpose.

How is compiler dealing with these Generic Plugin Interface instance methods?

I'm working with some, unfortunately largely undocumented, existing code and I'm having trouble understanding how it calls upon the methods of the plugins it loads.
My aim at the moment is simply to step into one of the methods loaded via the plugin manager, as it's causing an exception. However I had to rebuild the pluginManager from the source to get debug symbols and when I reference this new DLL version the compiler throws up arms.
The code appears to load the plugin into plug.Instance and then access the specific methods like so plug.Instance.ReturnLeaNumber();
This compiler error makes sense, because it doesn't know the details of the plugins. What confuses me is how the compiler ever knew these where valid before run time, when no plugins are initialized. I can step through the code that doesn't work now with the older DLL!
This is an example of where the program loads up a plugin.
plug = GenericServicePlugins.AvailablePlugins.Find(Application.StartupPath + "\\Dlls\\SchoolInterface.dll");
// Compiler doesn't like this next line anymore though
plug.Instance.Initialize(null, null);
If there are any differences between my rebuilt library and the previously working one, I can't tell how as the versions match up with the ones in our source control. Would appreciate some advice on where to start looking!
public interface IGenericPluginMasterInterface
{
String returnName();
void Initialize(ExceptionStringResources.Translate ExceptionStrings);
Object ExecuteFunction(String macAddress, bool log, String functionName, LoginCredentials logonCredentials, WebConfiguration webConfig,
Int64 dataLinkId, DataLinkParam[] dataLinkParams, String dataName,
DataParam[] dataParams, Object[] additionalParams);
}
Rest of Manager code on PasteBin
How does the compiler know about these plug.Instance.Method() methods before runtime?
Edit:
I've not quite worked this out yet, but there was a "PluginsService" file I missed which partly mirrors the "GenericPluginServices".
I think this error could have been caused when I removed parts of this class that related to an now defunct plugin, which I am looking into. However I figured posting this other code snippet would help the question.
PluginService.cs code
GenericPluginService code
Find returns AvailablePlugin, so .Instance is of type IGenericPluginMasterInterface; if so, indeed; that .Instance.ReturnLeaNumber() can't possibly work...
The only way that could work (without introducing some generics etc) is if .Instance actually returned dynamic. With dynamic the name/method resolution is happening at runtime. The compiler treats dynamic very deliberately such as to defer all resolution to runtime, based on either reflection (for simple cases) or IDynamicMetaObjectProvider (for more sohpisticated cases).
However, if the code you have doesn't match what was compiled, then: we can't tell you what it was. IMO, the best option is to get hold of the working dll, and look at it in reflector to see what it is actually doing, and how it is different to the source code that you have.
Actually, strictly speaking it could still do that with the code you've pasted, but only if plug is typed as dynamic, i.e. dynamic plug = ...

How does C# compilation get around needing header files?

I've spent my professional life as a C# developer. As a student I occasionally used C but did not deeply study it's compilation model. Recently I jumped on the bandwagon and have begun studying Objective-C. My first steps have only made me aware of holes in my pre-existing knowledge.
From my research, C/C++/ObjC compilation requires all encountered symbols to be pre-declared. I also understand that building is a two-step process. First you compile each individual source file into individual object files. These object files might have undefined "symbols" (which generally correspond to the identifiers declared in the header files). Second you link the object files together to form your final output. This is a pretty high-level explanation but it satisfies my curiosity enough. But I'd also like to have a similar high-level understanding of the C# build process.
Q: How does the C# build process get around the need for header files? I'd imagine perhaps the compilation step does two-passes?
(Edit: Follow up question here How do C/C++/Objective-C compare with C# when it comes to using libraries?)
UPDATE: This question was the subject of my blog for February 4th 2010. Thanks for the great question!
Let me lay it out for you. In the most basic sense the compiler is a "two pass compiler" because the phases that the compiler goes through are:
Generation of metadata.
Generation of IL.
Metadata is all the "top level" stuff that describes the structure of the code. Namespaces, classes, structs, enums, interfaces, delegates, methods, type parameters, formal parameters, constructors, events, attributes, and so on. Basically, everything except method bodies.
IL is all the stuff that goes in a method body -- the actual imperative code, rather than metadata about how the code is structured.
The first phase is actually implemented via a great many passes over the sources. It's way more than two.
The first thing we do is take the text of the sources and break it up into a stream of tokens. That is, we do lexical analysis to determine that
class c : b { }
is class, identifier, colon, identifier, left curly, right curly.
We then do a "top level parse" where we verify that the token streams define a grammaticaly-correct C# program. However, we skip parsing method bodies. When we hit a method body, we just blaze through the tokens until we get to the matching close curly. We'll come back to it later; we only care about getting enough information to generate metadata at this point.
We then do a "declaration" pass where we make notes about the location of every namespace and type declaration in the program.
We then do a pass where we verify that all the types declared have no cycles in their base types. We need to do this first because in every subsequent pass we need to be able to walk up type hierarchies without having to deal with cycles.
We then do a pass where we verify that all generic parameter constraints on generic types are also acyclic.
We then do a pass where we check whether every member of every type -- methods of classes, fields of structs, enum values, and so on -- is consistent. No cycles in enums, every overriding method overrides something that is actually virtual, and so on. At this point we can compute the "vtable" layouts of all interfaces, classes with virtual methods, and so on.
We then do a pass where we work out the values of all "const" fields.
At this point we have enough information to emit almost all the metadata for this assembly. We still do not have information about the metadata for iterator/anonymous function closures or anonymous types; we do those late.
We can now start generating IL. For each method body (and properties, indexers, constructors, and so on), we rewind the lexer to the point where the method body began and parse the method body.
Once the method body is parsed, we do an initial "binding" pass, where we attempt to determine the types of every expression in every statement. We then do a whole pile of passes over each method body.
We first run a pass to transform loops into gotos and labels.
(The next few passes look for bad stuff.)
Then we run a pass to look for use of deprecated types, for warnings.
Then we run a pass that searches for uses of anonymous types that we haven't emitted metadata for yet, and emit those.
Then we run a pass that searches for bad uses of expression trees. For example, using a ++ operator in an expression tree.
Then we run a pass that looks for all local variables in the body that are defined, but not used, to report warnings.
Then we run a pass that looks for illegal patterns inside iterator blocks.
Then we run the reachability checker, to give warnings about unreachable code, and tell you when you've done something like forgotten the return at the end of a non-void method.
Then we run a pass that verifies that every goto targets a sensible label, and that every label is targetted by a reachable goto.
Then we run a pass that checks that all locals are definitely assigned before use, notes which local variables are closed-over outer variables of an anonymous function or iterator, and which anonymous functions are in reachable code. (This pass does too much. I have been meaning to refactor it for some time now.)
At this point we're done looking for bad stuff, but we still have way more passes to go before we sleep.
Next we run a pass that detects missing ref arguments to calls on COM objects and fixes them. (This is a new feature in C# 4.)
Then we run a pass that looks for stuff of the form "new MyDelegate(Foo)" and rewrites it into a call to CreateDelegate.
Then we run a pass that transforms expression trees into the sequence of factory method calls necessary to create the expression trees at runtime.
Then we run a pass that rewrites all nullable arithmetic into code that tests for HasValue, and so on.
Then we run a pass that finds all references of the form base.Blah() and rewrites them into code which does the non-virtual call to the base class method.
Then we run a pass which looks for object and collection initializers and turns them into the appropriate property sets, and so on.
Then we run a pass which looks for dynamic calls (in C# 4) and rewrites them into dynamic call sites that use the DLR.
Then we run a pass that looks for calls to removed methods. (That is, partial methods with no actual implementation, or conditional methods that don't have their conditional compilation symbol defined.) Those are turned into no-ops.
Then we look for unreachable code and remove it from the tree. No point in codegenning IL for it.
Then we run an optimization pass that rewrites trivial "is" and "as" operators.
Then we run an optimization pass that looks for switch(constant) and rewrites it as a branch directly to the correct case.
Then we run a pass which turns string concatenations into calls to the correct overload of String.Concat.
(Ah, memories. These last two passes were the first things I worked on when I joined the compiler team.)
Then we run a pass which rewrites uses of named and optional parameters into calls where the side effects all happen in the correct order.
Then we run a pass which optimizes arithmetic; for example, if we know that M() returns an int, and we have 1 * M(), then we just turn it into M().
Then we do generation of the code for anonymous types first used by this method.
Then we transform anonymous functions in this body into methods of closure classes.
Finally, we transform iterator blocks into switch-based state machines.
Then we emit the IL for the transformed tree that we've just computed.
Easy as pie!
I see that there are multiple interpretations of the question. I answered the intra-solution interpretation, but let me fill it out with all the information I know.
The "header file metadata" is present in the compiled assemblies, so any assembly you add a reference to will allow the compiler to pull in the metadata from those.
As for things not yet compiled, part of the current solution, it will do a two-pass compilation, first reading namespaces, type names, member names, ie. everything but the code. Then when this checks out, it will read the code and compile that.
This allows the compiler to know what exists and what doesn't exist (in its universe).
To see the two-pass compiler in effect, test the following code that has 3 problems, two declaration-related problems, and one code problem:
using System;
namespace ConsoleApplication11
{
class Program
{
public static Stringg ReturnsTheWrongType()
{
return null;
}
static void Main(string[] args)
{
CallSomeMethodThatDoesntExist();
}
public static Stringg AlsoReturnsTheWrongType()
{
return null;
}
}
}
Note that the compiler will only complain about the two Stringg types that it cannot find. If you fix those, then it complains about the method-name called in the Main method, that it cannot find.
It uses the metadata from the reference assemblies. That contains a full type declaration, same thing as you'd find in a header file.
It being a two-pass compiler accomplishes something else: you can use a type in one source file before it is declared in another source code file.
It's a 2-pass compiler. http://en.wikipedia.org/wiki/Multi-pass_compiler
All the necessary information can be obtained from the referenced assemblies.
So there are no header files but the compiler does need access to the DLL's being used.
And yes, it is a 2-pass compiler but that doesn't explain how it gets information about library types.

Justification for Reflection in C#

I have wondered about the appropriateness of reflection in C# code. For example I have written a function which iterates through the properties of a given source object and creates a new instance of a specified type, then copies the values of properties with the same name from one to the other. I created this to copy data from one auto-generated LINQ object to another in order to get around the lack of inheritance from multiple tables in LINQ.
However, I can't help but think code like this is really 'cheating', i.e. rather than using using the provided language constructs to achieve a given end it allows you to circumvent them.
To what degree is this sort of code acceptable? What are the risks? What are legitimate uses of this approach?
Sometimes using reflection can be a bit of a hack, but a lot of the time it's simply the most fantastic code tool.
Look at the .Net property grid - anyone who's used Visual Studio will be familiar with it. You can point it at any object and it it will produce a simple property editor. That uses reflection, in fact most of VS's toolbox does.
Look at unit tests - they're loaded by reflection (at least in NUnit and MSTest).
Reflection allows dynamic-style behaviour from static languages.
The one thing it really needs is duck typing - the C# compiler already supports this: you can foreach anything that looks like IEnumerable, whether it implements the interface or not. You can use the C#3 collection syntax on any class that has a method called Add.
Use reflection wherever you need dynamic-style behaviour - for instance you have a collection of objects and you want to check the same property on each.
The risks are similar for dynamic types - compile time exceptions become run time ones. You code is not as 'safe' and you have to react accordingly.
The .Net reflection code is very quick, but not as fast as the explicit call would have been.
I agree, it gives me the it works but it feels like a hack feeling. I try to avoid reflection whenever possible. I have been burned many times after refactoring code which had reflection in it. Code compiles fine, tests even run, but under special circumstances (which the tests didn't cover) the program blows up run-time because of my refactoring in one of the objects the reflection code poked into.
Example 1: Reflection in OR mapper, you change the name or the type of the property in your object model: Blows up run-time.
Example 2: You are in a SOA shop. Web Services are complete decoupled (or so you think). They have their own set of generated proxy classes, but in the mapping you decide to save some time and you do this:
ExternalColor c = (ExternalColor)Enum.Parse(typeof(ExternalColor),
internalColor.ToString());
Under the covers this is also reflection but done by the .net framework itself. Now what happens if you decide to rename InternalColor.Grey to InternalColor.Gray? Everything looks ok, it builds fine, and even runs fine.. until the day some stupid user decides to use the color Gray... at which point the mapper will blow up.
Reflection is a wonderful tool that I could not live without. It can make programming much easier and faster.
For instance, I use reflection in my ORM layer to be able to assign properties with column values from tables. If it wasn't for reflection I have had to create a copy class for each table/class mapping.
As for the external color exception above. The problem is not Enum.Parse, but that the coder didnt not catch the proper exception. Since a string is parsed, the coder should always assume that the string can contain an incorrect value.
The same problem applies to all advanced programming in .Net. "With great power, comes great responsibility". Using reflection gives you much power. But make sure that you know how to use it properly. There are dozens of examples on the web.
It may be just me, but the way I'd get into this is by creating a code generator - using reflection at runtime is a bit costly and untyped. Creating classes that would get generated according to your latest code and copy everything in a strongly typed manner would mean that you will catch these errors at build-time.
For instance, a generated class may look like this:
static class AtoBCopier
{
public static B Copy(A item)
{
return new B() { Prop1 = item.Prop1, Prop2 = item.Prop2 };
}
}
If either class doesn't have the properties or their types change, the code doesn't compile. Plus, there's a huge improvement in times.
I recently used reflection in C# for finding implementations of a specific interface. I had written a simple batch-style interpreter that looked up "actions" for each step of the computation based on the class name. Reflecting the current namespace then pops up the right implementation of my IStep inteface that can be Execute()ed. This way, adding new "actions" is as easy as creating a new derived class - no need to add it to a registry, or even worse: forgetting to add it to a registry...
Reflection makes it very easy to implement plugin architectures where plugin DLLs are automatically loaded at runtime (not explicitly linked at compile time).
These can be scanned for classes that implement/extend relevant interfaces/classes. Reflection can then be used to instantiate instances of these on demand.

Categories