Learning DLR (how to implement a language on top of it) - c#

I am trying to learn how to write a simple scripting language on top of DLR, by playing with a very old DLR example called ToyScript. However ToyScript does not seem to support the following structure of a script, which I would like to use in my implementation :
print b()
def b() {
return 1
}
It raises an exception, exactly as in most statically compiled languages.'
If the script follows a "static languages paradigm" :
def b() {
return 1
}
print b()
ToyScript works without problems.
My question is : how the former should be done in DLR ?
[Obviously I am looking for a description of a solution, and not for a solution itself :)]

There are a few possible implementations. The first is to require an execution to create a function. With this way, you cannot invoke a function before a function is created with an execution. The second way is to create the all the functions when you parse the code and execute the global scripts. With this way, function declaration can appear anywhere in the code because the functions are already created before any execution. The draw back is that you need to create all the functions no matter you invoke them or not. Then there is an in-between way; when you parse the code for the first time, you store the abstract syntax tree (AST) of the functions in a function table. Then when you want to invoke a function, look for the function declaration in the function table and then compile or interpret from the AST. Compare the following two JavaScript snippets and you will have a good idea.
console.log(b());
function b() {
return 1;
}
and
console.log(b());
var b = function() {
return 1;
}

Related

How does C# call a method before it has been defined in the program? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am very surprised that the following code block still executes despite the method not being defined or known yet to the compiler. I first noticed this in another program and then created this test case. Sure enough, it works. In python or JavaScript, this would not be the case. It would throw an undefined function error. That's why I always make sure to define the function first, then call it.
I did some research and while I couldn't find anything explicitly referring to this phenomena, I suspect this is possibly the main difference between scripting languages (python, JS) and languages with a Common language runtime interface. Perhaps the compiler runs through everything and converts it to machine code and by then it is already aware of all fields, properties, and methods contained within by the time it starts instruction execution.
Am I on the right track?
using System;
namespace TestingGround
{
class Program
{
static void Main(string[] args)
{
//Method call
Console.WriteLine(GetSumOfFirstTwoOrDefault(null)); // output: 0
Console.WriteLine(GetSumOfFirstTwoOrDefault(new int[0])); // output: 0
Console.WriteLine(GetSumOfFirstTwoOrDefault(new[] { 3, 5, 5 })); // output: 8
//Define Method for returning the sum of the first two integers in a list
int GetSumOfFirstTwoOrDefault(int[] numbers)
{
//normal input 1 x array of numbers
//normal output 1 x scalar that is the sum of the first two numbers only
//edge case - if input array null / empty
//Most concise with non null conditional operator and null condiitonal operator
//normal implementation if array null handle return 0, if array.length < 2 return 0, else
//add array[0] + array[1]
if ((numbers?.Length ?? 0) < 2)
{
return 0;
}
return numbers[0] + numbers[1];
}
}
}
}
This depends a lot on the language. Since you mentioned a few others, let's compare them on that count:
C# doesn't care about the order in which members or types are declared. As long as a method is visible from the current location in code you can call it. In your case you used a local method. This is effectively the same as declaring the method in the class itself, except that you can only call it from one method. If you run your code through the compiler and a decompiler after that, you can see that the local function is instead declared on the class, just with a compiler-generated name.
In JavaScript or Python you have to take care where you call the function. Sure, in the following snippet, things won't work:
foo()
function foo() {
console.log('Someone called me!')
}
This is because the first line is attempted to be executed before the rest has been read. However, with a more C#-like style of code this is not the case anymore:
def main():
foo()
def foo():
print('Someone called me!')
main()
This now works, because the content of the functions is parsed, but not executed immediately. And by the time we end up calling main, the foo function is defined and callable. In a way, this mimics how C# works as well. The compiler assembles everything together and by the time the code is actually executed, things are in place and can be found.
Now, in older languages like C or Pascal, there's a requirement for functions to be declared before you can call them. This is because back then computing power was scarce and single-pass compilers – compilers that went just once over the source code and spit out object or assembler code right away – were common. And in order to generate the code for a function call you need to know what arguments the function takes and what it returns – its signature. So in C you'd have something like this:
void foo(void); // forward declaration
void main(void) {
foo();
}
void foo(void) { // actual implementation
printf("Someone called me!\n");
}
The reason here is a bit different, but kinda comparable. While an interpreter executing a line would surely have to know about a function it's supposed to call, the compiler (assuming it can't go back) has to as well.
Coming back to C#: The C# compiler isn't a single-pass compiler and takes multiple passes over the source code. At least once to gather all types and members and their respective visibility (just the declarations, though), and at least once more to generate the actual IL code. And when that happens, the information about what names are visible from where and what they are, is already available to be used.

Datatype of Methods

Question
How does a delegate store a reference to a function? The source code appears to refer to it as an Object, and the manner in which it invokes the method seems redacted from the source code. Can anyone explain how C# is handling this?
Original Post
It seems I'm constantly fighting the abstractions C# imposes on its programmers. One that's been irking me is the obfuscation of Functions/Methods. As I understand it, all methods are in fact anonymous methods assigned to properties of a class. This is the reason why no function is prefixed by a datatype. For example...
void foo() { ... }
... would be written in Javascript as...
Function foo = function():void { ... };
In my experience, Anonymous functions are typically bad form, but here it's replete throughout the language standard. Because you cannot define a function with its datatype (and apparently the implication/handling is assumed by the compiler), how does one store a reference to a method if the type is never declared?
I'm trying very hard to avoid Delegates and its variants (Action & Func), both because...
it is another abstraction from what's actually happening
the unnecessary overhead required to instantiate these classes (which in turn carry their own pointers to the methods being called).
Looking at the source code for the Delegate.cs, it appears to refer to the reference of a function as simply Object (see lines 23-25).
If these really are objects, how are we calling them? According to the delegate.cs trail, it dead-ends on the following path:
Delegate.cs:DynamicInvoke() > DynamicInvokeImpl() > methodinfo.cs:UnsafeInvoke() > UnsafeInvokeInternal() > RuntimeMethodHandle.InvokeMethod() > runtimehandles.cs:InvokeMethod()
internal extern static object InvokeMethod(object target, object[] arguments, Signature sig, bool constructor);
This really doesn't explain how its invoked if indeed the method is an object. It feels as though this is not code at all, and the actual code called has been redacted from source repository.
Your help is appreciated.
Response to Previous Comments
#Amy: I gave an example immediately after that statement to explain what I meant. If a function were prefixed by a datatype, you could write a true anonymous function, and store it as a property to an Object such as:
private Dictionary<string, Function> ops = new Dictionary<string, Function> {
{"foo", int (int a, int b) { return a + b } }
};
As it stands, C# doesn't allow you to write true anonymous functions, and walls that functionality off behind Delegates and Lambda expressions.
#500 Internal server error: I already explained what I was trying to do. I even bolded it. You assume there's any ulterior motive here; I'm simply trying to understand how C# stores a reference to a method. I even provided links to the source code so that others could read the code for themselves and help answer the question.
#Dialecticus: Obviously if I already found the typical answer on Google, the only other place to find the answer I'm looking for would be here. I realize this is outside the knowledge of most C# developers, and that's why I've provided the source code links. You don't have to reply if you don't know the answer.
While I'm not fully understanding your insights about "true anonymous functions", "not prefixed by a data type" etc, I can explain you how applications written in C# call methods.
First of all, there is no such a thing "function" in C#. Each and every executable entity in C# is in fact a method, that means, it belongs to a class. Even if you define lambdas or anonymous functions like this:
collection.Where(item => item > 0);
the C# compiler creates a compiler-generated class behind the scenes and puts the lambda body return item > 0 into a compiler-generated method.
So assuming you have this code:
class Example
{
public static void StaticMethod() { }
public void InstanceMethod() { }
public Action Property { get; } = () => { };
}
static class Program
{
static void Main()
{
Example.StaticMethod();
var ex = new Example();
ex.InstanceMethod();
ex.Property();
}
}
The C# compiler will create an IL code out of that. The IL code is not executable right away, it needs to be run in a virtual machine.
The IL code will contain a class Example with two methods (actually, four - a default constructor and the property getter method will be automatically generated) and a compiler-generated class containing a method whose body is the body of the lambda expression.
The IL code of Main will look like this (simplified):
call void Example::StaticMethod()
newobj instance void Example::.ctor()
callvirt instance void Example::InstanceMethod()
callvirt instance class [mscorlib]System.Action Example::get_Prop()
callvirt instance void [mscorlib]System.Action::Invoke()
Notice those call and callvirt instructions: these are method calls.
To actually execute the called methods, their IL code needs to be compiled into machine code (CPU instructions). This occurs in the virtual machine called .NET Runtime. There are several of them like .NET Framework, .NET Core, Mono etc.
A .NET Runtime contains a JIT (just-in-time) compiler. It converts the IL code to the actually executable code during the execution of your program.
When the .NET Runtime first encounters the IL code "call method StaticMethod from class Example", it first looks in the internal cache of already compiled methods. When there are no matches (which means this is the first call of that method), the Runtime asks the JIT compiler to create such a compiled-and-ready-to-run method using the IL code. The IL code is converted into a sequence of CPU operations and stored in the process' memory. A pointer to that compiled code is stored in the cache for future reuse.
This all will happen behind the call or callvirt IL instructions (again, simplified).
Once this happened, the Runtime is ready to execute the method. The CPU gets the compiled code's first operation address as the next operation to execute and goes on until the code returns. Then, the Runtime takes over again and proceeds with next IL instructions.
The DynamicInvoke method of the delegates does the same thing: it instructs the Runtime to call a method (after some additional arguments checks etc). The "dead end" you mention RuntimeMethodHandle.InvokeMethod is an intrinsic call to the Runtime directly. The parameters of this method are:
object target - the object on which the delegate invokes the instance method (this parameter).
object[] arguments - the arguments to pass to the method.
Signature sig - the actual method to call, Signature is an internal class that provides the connection between the managed IL code and native executable code.
bool constructor - true if this is a constructor call.
So in summary, methods are not represented as objects in C# (while you of course can have a delegate instance that is an object, but it doesn't represent the executable method, it rather provides an invokable reference to it).
Methods are called by the Runtime, the JIT compiler makes the methods executable.
You cannot define a global "function" outside of classes in C#. You could get a direct native pointer to the compiled (jitted) method code and probably even call it manually by directly manipulating own process' memory. But why?
You clearly misunderstand main differences between script languages, C/C++ and C#.
I guess the main difficulty is that there is no such thing as a function in C#. At all.
C#7 introduced the new feature "a local function", but that is not what a function in JS is.
All pieces of code are methods.
That name is intentionally different from function or a procedure to emphasize the fact that all executable code in C# belongs to a class.
Anonymous methods and lambdas are just a syntax sugar.
A compiler will generate a real method in the same (or a nested) class, where the method with anonymous method declaration belongs to.
This simple article explains it. You can take the examples, compile them and check the generated IL code yourself.
So all the methods (anonymous or not) do belong to a class. It's impossible to answer your updated question, besides saying It does not store a reference to a function, as there is no such thing in C#.
How does one store a reference to a method?
Depending on what you mean by reference, it can be either
An instance of MethodInfo class, used to reference reflection information for a method,
RuntimeMethodHandle (obtainable via RuntimeMethodInfo.MethodHandle) stores a real memory pointer to a JITed method code
A delegate, that is very different from just a memory pointer, but logically could be used to "pass a method reference to another method" .
I believe you are looking for the MethodInfo option, it has a MethodInfo.Invoke method which is very much alike Function..apply function in JS. You have already seen in the Delegate source code how that class is used.
If by "reference" you mean the C-style function pointer, it is in RuntimeMethodHandle struct. You should never use it without solid understanding how a particular .Net platform implementation and a C# compiler work.
Hopefully it clarifies things a bit.
A delegate is simply a pointer(memory location to jump to) to a method with the specified parameters and return type. Any Method that matches the signature(Parameters and return type) is eligible to fulfill the role, irrespective of the defined object. Anonymous simply means the delegate is not named.
Most times the type is implied(if it is not you will get a compiler error):
C# is a strongly typed language. That means every expression (including delegates) MUST have a return type(including void) as well as strongly typed parameters(if any). Generics were created to permit explicit types to be used within general contexts, such as Lists.
To put it another way, delegates are the type-safe managed version of C++ callbacks.
Delegates are helpful in eliminating switch statements by allowing the code to jump to the proper handler without testing any conditions.
A delegate is similar to a Closure in Javascript terminology.
In your response to Amy, you are attempting to equate a loosely typed language like JS, and a strongly typed language C#. In C# it is not possible to pass an arbitrary(loosely-typed) function anywhere. Lambdas and delegates are the only way to guarantee type safety.
I would recommend trying F#, if you are looking to pass functions around.
EDIT:
If you are trying to mimic the behavior of Javascipt, I would try looking at using inheritance through Interfaces. I can mimic multiple inheritance, and be type safe at the same time. But, be aware that it cannot fully supplant Javascript's dependency injection model.
As you probably found out C# doesn't have the concept of a function as in your JavaScript example.
C# is a statically typed language and the only way you can use function pointers is by using the built in types (Func,Action) or custom delegates.(I'm talking about safe,strongly typed pointers)
Javascript is a dynamic language that's why you can do what you describe
If you are willing to lose type safety, you can use the "dynamic" features of C# or refection to achieve what you want like in the following examples (Don't do this,use Func/Action)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;
namespace ConsoleApp1
{
class Program
{
private static Dictionary<string, Func<int, int, int>> FuncOps = new Dictionary<string, Func<int, int, int>>
{
{"add", (a, b) => a + b},
{"subtract", (a, b) => a - b}
};
//There are no anonymous delegates
//private static Dictionary<string, delegate> DelecateOps = new Dictionary<string, delegate>
//{
// {"add", delegate {} }
//};
private static Dictionary<string, dynamic> DynamicOps = new Dictionary<string, dynamic>
{
{"add", new Func<int, int, int>((a, b) => a + b)},
{"subtract", new Func<int, int, int>((a, b) => a - b)},
{"inverse", new Func<int, int>((a) => -a )} //Can't do this with Func
};
private static Dictionary<string, MethodInfo> ReflectionOps = new Dictionary<string, MethodInfo>
{
{"abs", typeof(Math).GetMethods().Single(m => m.Name == "Abs" && m.ReturnParameter.ParameterType == typeof(int))}
};
static void Main(string[] args)
{
Console.WriteLine(FuncOps["add"](3, 2));//5
Console.WriteLine(FuncOps["subtract"](3, 2));//1
Console.WriteLine(DynamicOps["add"](3, 2));//5
Console.WriteLine(DynamicOps["subtract"](3, 2));//1
Console.WriteLine(DynamicOps["inverse"](3));//-3
Console.WriteLine(ReflectionOps["abs"].Invoke(null, new object[] { -1 }));//1
Console.ReadLine();
}
}
}
one more example that you shouldn't use
delegate object CustomFunc(params object[] paramaters);
private static Dictionary<string, CustomFunc> CustomParamsOps = new Dictionary<string, CustomFunc>
{
{"add", parameters => (int) parameters[0] + (int) parameters[1]},
{"subtract", parameters => (int) parameters[0] - (int) parameters[1]},
{"inverse", parameters => -((int) parameters[0])}
};
Console.WriteLine(CustomParamsOps["add"](3, 2)); //5
Console.WriteLine(CustomParamsOps["subtract"](3, 2)); //1
Console.WriteLine(CustomParamsOps["inverse"](3)); //-3
I will provide a really short and simplified answer compared to the others. Everything in C# (classes, variables, properties, structs, etc) has a backed with tons of things your programs can hook into. This network of backend stuff slightly lowers the speed of C# when compared to "deeper" languages like C++, but also gives programmers a lot more tools to work with and makes the language easier to use. In this backend is included things like "garbage collection," which is a feature that automatically deletes objects from memory when there are no variables left that reference them. Speaking of reference, the whole system of passing objects by reference, which is default in C#, is also managed in the backend. In C#, Delegates are possible because of features in this backend that allow for something called "reflection."
From Wikipedia:
Reflection is the ability of a computer program to examine,
introspect, and modify its own structure and behavior at runtime.
So when C# compiles and it finds a Delegate, it is just going to make a function, and then store a reflective reference to that function in the variable, allowing you to pass it around and do all sorts of cool stuff with it. You aren't actually storing the function itself in the variable though, you are storing a reference, which is kinda like an address that points you to where the function is stored in RAM.

C# delegate to Java conversion

I am in the process of converting some code from C# to Java. I have never used C# before, but it has been pretty easy up to this point.
I have a line that looks like this in the C# file:
coverage.createMethod = delegate (Gridpoint gp){
//Some method stuff in here, with a return objecct
}
What exactly is this trying to do? It seems a little bit like an inline class but I am not sure how to go about converting htis to java
Edit: more on the specific problem
In a file called STKDriver.java i have the following
CoverageDefinitionOnCentralBody coverage = new CoverageDefinitionOnCentralBody(...);
.
.
.
DClass value = new DClass(STKDriver.class, "invoke", CoverageGrid.class);
coverage.setGridPointCreationMethod(value);
In the fill DClass.java, which extends CreateCoverageGridPointForAccess I have the following:
public DClass(Class class1, String string, Class<CoverageGridPoint> class2{}
.
.
.
public IServiceProvider invoke(CoverageGridPoint gridPoint){
return something; //of the classtype Platform
}
Is this done correctly? The class definitions are linked here:
http://www.agi.com/resources/help/online/AGIComponentsJava/index.html?page=source%2FWhatsNewJava.html
Notice that the class CreateCoverageGridPointForAccess is abstract and extends that Delegate class.
Does this implementation I have created look correct? I Can write in more code if necessary
This is an anonymous method in C#. It's technically the same thing as:
coverage.createMethod = new Func<Gridpoint, object>(SampleMethod);
public object SampleMethod(Gridpoint gp)
{
return thingy; // Pseudo for return value
}
It's just a shortcut you can use to code less.
Tejs' answer is correct. However be careful because anonymous functions can use closures which means using an existing local variable declared in the outer function, from the anonymous delegate. I'm no Java programmer so I don't know if Java supports this.
coverage.createMethod is a Delegate.
The following code creates an anonymous method and assigns it to the delegate:
coverage.createMethod = delegate (Gridpoint gb) {
}
so that when somebody calls coverage.createMethod, your anonymous method gets executed.

How to use an IronRuby block with a C# method

I'm using IronRuby and trying to work out how to use a block with a C# method.
This is the basic Ruby code I'm attempting to emulate:
def BlockTest ()
result = yield("hello")
puts result
end
BlockTest { |x| x + " world" }
My attempt to do the same thing with C# and IronRuby is:
string scriptText = "csharp.BlockTest { |arg| arg + 'world'}\n";
ScriptEngine scriptEngine = Ruby.CreateEngine();
ScriptScope scriptScope = scriptEngine.CreateScope();
scriptScope.SetVariable("csharp", new BlockTestClass());
scriptEngine.Execute(scriptText, scriptScope);
The BlockTestClass is:
public class BlockTestClass
{
public void BlockTest(Func<string, string> block)
{
Console.WriteLine(block("hello "));
}
}
When I run the C# code I get an exception of:
wrong number of arguments (0 for 1)
If I change the IronRuby script to the following it works.
string scriptText = "csharp.BlockTest lambda { |arg| arg + 'world'}\n";
But how do I get it to work with the original IronRuby script so that it's the equivalent of my original Ruby example?
string scriptText = "csharp.BlockTest { |arg| arg + 'world'}\n";
Ruby's blocks are not a concept understood by c# (or any of the other .Net languages).
To 'pass one' to the similar concept in c# of the delegate you must 'wrap it' in something that is understandable.
By making a lambda out of the block it becomes something you can pass to c# code expecting a delegate or expression.
This is a common issue with the 'Alt.Net' community, even for blessed languages like f# where 'functions pointers' are not implemented as delegates but instead are done slightly differently (FastFunc instances in f# for example) to pass one of these to something like your c# example would require wrapping it in a delegate (literally creating a delegate whose invocation passes the parameters to the underlying instance and returns the result back).
One could argue that such translation would be nicer if it was automatic, but doing that can lead to complex and strange edge cases or bugs and many developers prefer to know that such a wrapping operation will occurred just by looking at the code. It is also the case that there may not always be reasonable conversion (or more than one exists) so making the user decide what happens is a sensible default.
For the most part, IronRuby Procs and lambdas are interchangeable with CLR Actions, Funcs, delegate types, and dynamic objects. However there's little Ruby syntactic sugar over this, other than some call-site conversions. Once place we did sweeten up the syntax was for .NET events; IronRuby allows passing a Ruby block as an CLR event handler: button.on_click {|s,e| ... }.
We have toyed a bunch of ways to allow blocks to be passed to CLR methods; either by detecting methods whose last argument is a callable object, or by allowing a special named parameter. There's a feature request (though cryptically named) already open for this: http://ironruby.codeplex.com/workitem/4511. Would be a good challenge for anyone willing to contribute.
You can totally use ruby blocks in c#, in fact i have used this in an application that is currently in production! here is how:
In c# file:
public void BlockTest(dynamic block)
{
Console.WriteLine(block.call("world"));
}
In Ironruby:
#require the assembly
block = Proc.new {|i| "hello " + i }
Blah::Blah.BlockTest block
note: tested in c# 4.0 only

Compiling nested mutually recursive function

I'm creating a toy dynamic language (aching towards javascript) and while my implementation is on top of the DLR, I figured the solution to this problem would be pretty language/platform agnostic.
I have no problem with compiling recursive functions, or mutually recursive functions that exist next to each other. But compiling nested mutual recursive functions turned out to be a lot harder.
The example function I'm using to test is the following
void f(int x) {
void g(int y) {
if((x + y) < 100) {
f(x + y);
} else {
print(x + y);
}
}
g(x);
}
I figured that the solution to solving this has to be pretty general (maybe I'm wrong) and not specific to the DLR, I assume I somehow have to lift out the inner definition of g and define it before f and still keep the closure context.
Closures are usually represented as combining function pointers and a list of arguments. The first step is, indeed to lift all nested functions to global scope, with any bound variables from their environment as parameters. This would be the equivalent of:
void _f(int x)
{
closure g = closure(_g,x);
call(g,x);
}
void _g(int x, int y)
{
...;
}
Once you have the 'closure' and 'call' primitives, it works. In a static language, closure() would only keep the relevant variables. In a dynamic language, closure() has to keep the entire context stack available in case the function needs it.
I know you are creating a dynamic language but I think the same principles apply as a non-dynamic language - you still have a symbol table and you still have to process the source via multiple passes.
If you are creating a semantic tree before your code generation phase this is easy. The call to the function points to the object (node) which will contain the semantic definition for the function. Or it is just a node that says (semantically) call this function. Since the call to the function does not need to know what the function contains, just a pointer to the symbol table entry works fine.
If you are doing optimization (tail end recursion?) then you need to perform two passes before you can analyze it for this type of optimization. This is normal for all compilers I've seen as this phase happens after the semantic/lexical analysis.
I guess the diagram in this article is ok in showing what I'm talking about (however it has the extra bit of two different input languages.)
What you are trying to accomplish is an y-combinator
http://blogs.msdn.com/wesdyer/archive/2007/02/02/anonymous-recursion-in-c.aspx
What is a y-combinator?

Categories