Discovery of op_Addition OR implementing method for Expression.Add

Discovery of op_Addition OR implementing method for Expression.Add - c#

I'm writing a language using Antlr and Expression trees.
I've defined a standard factory method for my Tree Parser to use when generating addition, and it works very nicely for the built in integral types, now I'm moving on to more general types.
At the moment it's incredibly naive, it simply does this (in-progress TDD code often looks naive right!?):
protected Expression GenerateAdd(Expression left, Expression right)
{
if (left.Type.Equals(right.Type))
return Expression.Add(left, right);
if (left.Type.IsValueType && right.Type.IsValueType)
Promote7_2_6_2(ref left, ref right);
return Expression.Add(left, right);
}
Where Promote7_2_6_2 generates Convert expressions that follow the rules for integral promotion as laid out by the C# spec 7.2.6.2 (The language will be similar to C#, but will cross-over with JScript as well as having other brand new keywords).
Naturally, I moved on to testing string addition - i.e. "a" + "b"; and I get the error:
System.InvalidOperationException: The binary operator Add is not defined for the types 'System.String' and 'System.String'.
Fair enough - I reflect System.String and sure enough that operator is indeed not defined. Generating an expression tree in a test method like this:
Expression<Func<string, string, string>> e = (s1, s2) => s1 + s2;
Shows that an Add BinaryExpression is indeed created, but with the implementing method set to one of the string.Concat methods.
I was aware that I was going to have to look at doing something like this in some cases, but how many other types define addition in this way? Is it just string?
Is it a rule embedded within the C# compiler - or is there some kind of discoverable meta data that I can use to auto-discover such methods on other types?
Thanks in advance!

It seems I do a great line in answering my own questions!
My apologies, this answer could be better formatted, but I'm on my HTC desire and its keyboard doesn't support all the symbols!
it appears that there is no way to 'discover' these rules at runtime, it is the host language's responsibility to decide how to implement such things as string addition. C# adapts to the number of consecutive terms in an addition, calling the the .Concat method that most appropriately matches that.
So, if I wish my language to support addition of class instances where an operator is not defined for it, for example, I can simply write or find a static method to do it (of the correct signature of course!) and then hardwire the language to use it in that case. A classic example here is whether to support array1 + array2 through the static Array methods.
As for discovery of operators, the Expression.Add method takes care of that, but it doesn't automatically perform any conversions, so, as with the integral/floating point promotion method I reference in the question, again it is up to the language's rules to determine if other conversions are required before attempting to build the expression.
As such, it is probably best to reflect the operator first, to see if one is defined for the two types, before then considering a conversion if one exists.

Related

CS0411 The type arguments inferred from the usage [duplicate]

Given:
static TDest Gimme<TSource,TDest>(TSource source)
{
return default(TDest);
}
Why can't I do:
string dest = Gimme(5);
without getting the compiler error:
error CS0411: The type arguments for method 'Whatever.Gimme<TSource,TDest>(TSource)' cannot be inferred from the usage. Try specifying the type arguments explicitly.
The 5 can be inferred as int, but there's a restriction where the compiler won't/can't resolve the return type as a string. I've read in several places that this is by design but no real explanation. I read somewhere that this might change in C# 4, but it hasn't.
Anyone know why return types cannot be inferred from generic methods? Is this one of those questions where the answer's so obvious it's staring you in the face? I hope not!

The general principle here is that type information flows only "one way", from the inside to the outside of an expression. The example you give is extremely simple. Suppose we wanted to have type information flow "both ways" when doing type inference on a method R G<A, R>(A a), and consider some of the crazy scenarios that creates:
N(G(5))
Suppose there are ten different overloads of N, each with a different argument type. Should we make ten different inferences for R? If we did, should we somehow pick the "best" one?
double x = b ? G(5) : 123;
What should the return type of G be inferred to be? Int, because the other half of the conditional expression is int? Or double, because ultimately this thing is going to be assigned to double? Now perhaps you begin to see how this goes; if you're going to say that you reason from outside to inside, how far out do you go? There could be many steps along the way. See what happens when we start to combine these:
N(b ? G(5) : 123)
Now what do we do? We have ten overloads of N to choose from. Do we say that R is int? It could be int or any type that int is implicitly convertible to. But of those types, which ones are implicitly convertible to an argument type of N? Do we write ourselves a little prolog program and ask the prolog engine to solve what are all the possible return types that R could be in order to satisfy each of the possible overloads on N, and then somehow pick the best one?
(I'm not kidding; there are languages that essentially do write a little prolog program and then use a logic engine to work out what the types of everything are. F# for example, does way more complex type inference than C# does. Haskell's type system is actually Turing Complete; you can encode arbitrarily complex problems in the type system and ask the compiler to solve them. As we'll see later, the same is true of overload resolution in C# - you cannot encode the Halting Problem in the C# type system like you can in Haskell but you can encode NP-HARD problems into overload resolution problems.) (See below)
This is still a very simple expression. Suppose you had something like
N(N(b ? G(5) * G("hello") : 123));
Now we have to solve this problem multiple times for G, and possibly for N as well, and we have to solve them in combination. We have five overload resolution problems to solve and all of them, to be fair, should be considering both their arguments and their context type. If there are ten possibilities for N then there are potentially a hundred possibilities to consider for N(N(...)) and a thousand for N(N(N(...))) and very quickly you would have us solving problems that easily had billions of possible combinations and made the compiler very slow.
This is why we have the rule that type information only flows one way. It prevents these sorts of chicken and egg problems, where you are trying to both determine the outer type from the inner type, and determine the inner type from the outer type and cause a combinatorial explosion of possibilities.
Notice that type information does flow both ways for lambdas! If you say N(x=>x.Length) then sure enough, we consider all the possible overloads of N that have function or expression types in their arguments and try out all the possible types for x. And sure enough, there are situations in which you can easily make the compiler try out billions of possible combinations to find the unique combination that works. The type inference rules that make it possible to do that for generic methods are exceedingly complex and make even Jon Skeet nervous. This feature makes overload resolution NP-HARD.
Getting type information to flow both ways for lambdas so that generic overload resolution works correctly and efficiently took me about a year. It is such a complex feature that we only wanted to take it on if we absolutely positively would have an amazing return on that investment. Making LINQ work was worth it. But there is no corresponding feature like LINQ that justifies the immense expense of making this work in general.
UPDATE: It turns out that you can encode arbitrarily difficult problems in the C# type system. C# has nominal generic subtyping with generic contravariance, and it has been shown that you can build a Turing Machine out of generic type definitions and force the compiler to execute the machine, possibly going into infinite loops. At the time I wrote this answer the undecidability of such type systems was an open question. See https://stackoverflow.com/a/23968075/88656 for details.

You have to do:
string dest = Gimme<int, string>(5);
You need to specify what your types are in the call to the generic method. How could it know that you wanted a string in the output?
System.String is a bad example because it's a sealed class, but say it wasn't. How could the compiler know that you didn't want one of its subclasses instead if you didn't specify the type in the call?
Take this example:
System.Windows.Forms.Control dest = Gimme(5);
How would the compiler know what control to actually make? You'd need to specify it like so:
System.Windows.Forms.Control dest = Gimme<int, System.Windows.Forms.Button>(5);

Calling Gimme(5) ignoring the return value is a legal statement how would the compiler know which type to return?

I use this technique when I need to do something like that:
static void Gimme<T>(out T myVariable)
{
myVariable = default(T);
}
and use it like this:
Gimme(out int myVariable);
Print(myVariable); //myVariable is already declared and usable.
Note that inline declaration of out variables is available since C# 7.0

This was a design decision I guess. I also find it useful while programming in Java.
Unlike Java, C# seems to evolve towards a functional programming language, and you can get type inference the other way round, so you can have:
var dest = Gimme<int, string>(5);
which will infer the type of dest. I guess mixing this and the java style inference could prove to be fairly difficult to implement.

If a function is supposed to return one of a small number of types, you could have it return a class with defined widening conversions to those types. I don't think it's possible to do that in a generic way, since the widening ctype operator doesn't accept a generic type parameter.

public class ReturnString : IReq<string>
{
}
public class ReturnInt : IReq<int>
{
}
public interface IReq<T>
{
}
public class Handler
{
public T MakeRequest<T>(IReq<T> requestObject)
{
return default(T);
}
}
var handler = new Handler();
string stringResponse = handler.MakeRequest(new ReturnString());
int intResponse = handler.MakeRequest(new ReturnInt());

How to avoid writing repetitive code for different numeric types in .NET

I am trying to write generic Vector2 type which would suite float, double, etc. types and use arithmetical operations. Is there any chance to do it in C#, F#, Nemerle or any other more or less mature .NET language?
I need a solution with
(1)good performance (same as I would have writing separate
Vector2Float, Vector2Double, etc. classes),
(2)which would allow
code to look nice (I do not want to emit code for each class in
run-time)
(3)and which would do as much compile time checking as possible.
For reasons 1 and 3 I would not like to use dynamics. Right now I am checking F# and Nemerle.
UPD: I expect to have a lot of mathematical code for this type. However, I would prefer to put the code in extension methods if it is possible.
UPD2: 'etc' types include int(which I actually doubt I would use) and decimal(which I think I might use, but not now). Using extension methods is just a matter of taste - if there are good reasons not to, please tell.

As mentioned by Daniel, F# has a feature called statically resolved type arguments which goes beyond what you can do with normal .NET generic in C#. The trick is that if you mark function as inline, F# generates specialized code automatically (a bit like C++ templates) and then you can use more powerful features of the F# type system to write generic math.
For example, if you write a simple add function and make it inline:
let inline add x y = x + y;;
The type inference prints the following type:
val inline add :
x: ^a -> y: ^b -> ^c
when ( ^a or ^b) : (static member ( + ) : ^a * ^b -> ^c)
You can see that the inferred type is fairly complex - it specifies a member constraint that requires one of the two arguments to define a + member (and this is also supported by standard .NET types) The good thing is that this can be fully inferred, so you will rarely have to write the ugly type definitions.
As mentioned in the comments, I wrote an article Writing generic numeric code that goes into more details of how to do this in F#. I don't think this can be easily done in C# and the inline functions that you write in F# should only be called from F# (calling them from C# would essentially use dynamic). But you can definitely write your generic numerical computations in F#.

This more directly addresses your previous question. You can't put a static member constraint on a struct, but you can put it on a static Create method.
[<Struct>]
type Vector2D<'a> private (x: 'a, y: 'a) =
static member inline Create<'a when 'a : (static member (+) : 'a * 'a -> 'a)>(x, y) = Vector2D<'a>(x, y)

C# alone will not help you in achieving that, unfortunately. Emitting structs at run-time wouldn't help you much either since your program couldn't statically refer to them.
If you really can't afford to duplicate the code, then as far as I know, "offline" code generation is the only way to go about this. Instead of generating the code at runtime, use AssemblyBuilder and friends to create an on-disk assembly with your Vector2 types, or generate a string of C# code to be fed to the compiler. I believe some of the native library wrappers take this route (ie OpenTK, SharpDX). You can then use ilmerge if you want to merge those types to one of your hand-coded libraries.
I'm assuming you must be coming from a C++ background where this is easily achieved using templates. However, you should ask yourself whether you actually need Vector2 types based on integral, decimal and other "exotic" numeric types. You probably won't be able to parameterize the rest of your code based on a specific Vector2 either so the effort might not be worth it.

Look into inline functions and Statically Resolved Type Parameters.

As I understand you a strict type in the compile time , but you don't care what happens in the runtime.
Nemerle language currently doesn't support this construction as you want.
But it supports macros and allows you writing DSLs to generate arbitrary code.
For instance you can do some macro which analyzes this code and transforms it to the correct type.
def vec = vector { [1,2] };
Assuming we have or create a type VectorInt the code could be translated to
def vec = VectorInt(1,2);
Of course you can write any code inside and transform it to any code you want :)
Operators can be implemented as usual operators of the class.
Nemerle also allows you to define any operators like F#.

make use of Generics , this makes is also type safe
more info on generics : http://msdn.microsoft.com/en-us/library/512aeb7t.aspx
But you also have availible datastructures such as List and Dictionary
Sounds like you want operator overloading, there are a lot of examples for this. There is not realy a good way to only allow decial, float and such. The only thing you can do is restrict to struct, but thats not exactly what you want.

System.Predicate<T>: Is this really what it means, or is this what it is used for?

MSDN defines System.Predicate this way:
Represents the method that defines a set of criteria and determines whether the specified object meets those criteria.
Is this really what it means, or just what it is typically used for? Because to me it just looks like a predefined delegate whose method must take an object of type T and return a bool - and nothing more.
Am I missing something?

The CLR doesn't enforce semantics of type names.
So yes, Predicate<T> is just a delegate taking a T and returning a bool, but it's meant to be used in places where a predicate (a test for a certain condition) is expected. It's up to the programmer to respect that convention. If you need a similar delegate without a predefined semantic meaning, you could use Func<T, bool>, for example.
To the compiler, there is no functional difference between a Predicate<T> or a Func<T, bool>. But to another developer reading your code, it provides an important hint as to what your code is supposed to do, provided you used it correctly.
Similarly, there's nothing to stop me from using System.DayOfWeek to store an arbitrary value between 1 and 7 that doesn't actually represent a day of the week. It would be a stupid thing to do, but the compiler will certainly let me. It's up to you to make sure your code makes sense, the compiler can't do that for you.

That is what i predicate is, the term is borrowed from predicate logics.
Then it is of course possible to make other functions, which are not strictly predicates, that have the same function signature.

You're spot on in thinking it's a predefined delegate but it's most common usage is in extensibility to determine outside a method call whether a condition is met.
Many now prefer to use the Func<T,bool> rather than Predicate<T> given it's more common usage within the framework itself.

I think the definition forces a special prototype in specific usage, while the delegate can be used as any function, only a System.Predicate can be used for a collection predicate function (i would say it is like a C++ pointer function prototype)

Limitations of the dynamic type in C#

Could you give me some reasons for limitations of the dynamic type in C#? I read about them in "Pro C# 2010 and the .NET 4 platform". Here is an excerpt (if quoting books is illegal here, tell me and I will remove the excerpt):
While a great many things can be
defined using the dynamic keyword,
there are some limitations regarding
its usage. While they are not show
stoppers, do know that a dynamic data
item cannot make use of lambda
expressions or C# anonymous methods
when calling a method. For example,
the following code will always result
in errors, even if the target method
does indeed take a delegate parameter
which takes a string value and returns
void.
dynamic a = GetDynamicObject();
// Error! Methods on dynamic data can’t use lambdas!
a.Method(arg => Console.WriteLine(arg));
To circumvent this restriction, you
will need to work with the underlying
delegate directly, using the
techniques described in Chapter 11
(anonymous methods and lambda
expressions, etc). Another limitation
is that a dynamic point of data cannot
understand any extension methods (see
Chapter 12). Unfortunately, this would
also include any of the extension
methods which come from the LINQ APIs.
Therefore, a variable declared with
the dynamic keyword has very limited
use within LINQ to Objects and other
LINQ technologies:
dynamic a = GetDynamicObject();
// Error! Dynamic data can’t find the Select() extension method!
var data = from d in a select d;
Thanks in advance.

Tomas's conjectures are pretty good. His reasoning on extension methods is spot on. Basically, to make extension methods work we need the call site to at runtime somehow know what using directives were in force at compile time. We simply did not have enough time or budget to develop a system whereby this information could be persisted into the call site.
For lambdas, the situation is actually more complex than the simple problem of determining whether the lambda is going to expression tree or delegate. Consider the following:
d.M(123)
where d is an expression of type dynamic. *What object should get passed at runtime as the argument to the call site "M"? Clearly we box 123 and pass that. Then the overload resolution algorithm in the runtime binder looks at the runtime type of d and the compile-time type of the int 123 and works with that.
Now what if it was
d.M(x=>x.Foo())
Now what object should we pass as the argument? We have no way to represent "lambda method of one variable that calls an unknown function called Foo on whatever the type of x turns out to be".
Suppose we wanted to implement this feature: what would we have to implement? First, we'd need a way to represent an unbound lambda. Expression trees are by design only for representing lambdas where all types and methods are known. We'd need to invent a new kind of "untyped" expression tree. And then we'd need to implement all of the rules for lambda binding in the runtime binder.
Consider that last point. Lambdas can contain statements. Implementing this feature requires that the runtime binder contain the entire semantic analyzer for every possible statement in C#.
That was orders of magnitude out of our budget. We'd still be working on C# 4 today if we'd wanted to implement that feature.
Unfortunately this means that LINQ doesn't work very well with dynamic, because LINQ of course uses untyped lambdas all over the place. Hopefully in some hypothetical future version of C# we will have a more fully-featured runtime binder and the ability to do homoiconic representations of unbound lambdas. But I wouldn't hold my breath waiting if I were you.
UPDATE: A comment asks for clarification on the point about the semantic analyzer.
Consider the following overloads:
class C {
public void M(Func<IDisposable, int> f) { ... }
public void M(Func<int, int> f) { ... }
...
}
and a call
d.M(x=> { using(x) { return 123; } });
Suppose d is of compile time type dynamic and runtime type C. What must the runtime binder do?
The runtime binder must determine at runtime whether the expression x=>{...} is convertible to each of the delegate types in each of the overloads of M.
In order to do that, the runtime binder must be able to determine that the second overload is not applicable. If it were applicable then you could have an int as the argument to a using statement, but the argument to a using statement must be disposable. That means that the runtime binder must know all the rules for the using statement and be able to correctly report whether any possible use of the using statement is legal or illegal.
Clearly that is not restricted to the using statement. The runtime binder must know all the rules for all of C# in order to determine whether a given statement lambda is convertible to a given delegate type.
We did not have time to write a runtime binder that was essentially an entire new C# compiler that generates DLR trees rather than IL. By not allowing lambdas we only have to write a runtime binder that knows how to bind method calls, arithmetic expressions and a few other simple kinds of call sites. Allowing lambdas makes the problem of runtime binding on the order of dozens or hundreds of times more expensive to implement, test and maintain.

Lambdas: I think that one reason for not supporting lambdas as parameters to dynamic objects is that the compiler wouldn't know whether to compile the lambda as a delegate or as an expression tree.
When you use a lambda, the compiler decides based on the type of the target parameter or variable. When it is Func<...> (or other delegate) it compiles the lambda as an executable delegate. When the target is Expression<...> it compiles lambda into an expression tree.
Now, when you have a dynamic type, you don't know whether the parameter is delegate or expression, so the compiler cannot decide what to do!
Extension methods: I think that the reason here is that finding extension methods at runtime would be quite difficult (and perhaps also inefficient). First of all, the runtime would need to know what namespaces were referenced using using. Then it would need to search all classes in all loaded assemblies, filter those that are accessible (by namespace) and then search those for extension methods...

Eric (and Tomas) says it well, but here is how I think of it.
This C# statement
a.Method(arg => Console.WriteLine(arg));
has no meaning without a lot of context. Lambda expressions themselves have no types, rather they are convertible to delegate (or Expression) types. So the only way to gather the meaning is to provide some context which forces the lambda to be converted to a specific delegate type. That context is typically (as in this example) overload resolution; given the type of a, and the available overloads Method on that type (including extension members), we can possibly place some context that gives the lambda meaning.
Without that context to produce the meaning, you end up having to bundle up all kinds of information about the lambda in the hopes of somehow binding the unknowns at runtime. (What IL could you possibly generate?)
In vast contrast, one you put a specific delegate type there,
a.Method(new Action<int>(arg => Console.WriteLine(arg)));
Kazam! Things just got easy. No matter what code is inside the lambda, we now know exactly what type it has, which means we can compile IL just as we would any method body (we now know, for example, which of the many overloads of Console.WriteLine we're calling). And that code has one specific type (Action<int>), which means it is easy for the runtime binder to see if a has a Method that takes that type of argument.
In C#, a naked lambda is almost meaningless. C# lambdas need static context to give them meaning and rule out ambiguities that arise from many possible coercisons and overloads. A typical program provides this context with ease, but the dynamic case lacks this important context.

Is it possible to call value type operators via reflection?

As C# operators e.g. +, +=, == are overridable. It lets me think they are sort of methods, thus wonder if there is a way to call them using reflection, on Int32 for instance.

What about this, it's simple, small and works :)
public T Add<T>(object x, object y)
{
return (T)Convert.ChangeType((dynamic)x + (dynamic)y, typeof(T));
}

Yes, the custom operators are invokable using reflection (they have special names, such as op_Addition), but System.Int32 doesn't define them, as fundamental, built-in, types are handled directly by IL opcodes like add, rather than method calls.

What exactly is it you want to do? Dealing with the various meanings of operators (primitive (mapped to specific IL instructions), custom (mapped to static methods), and lifted (provided as a pattern by the compiler)) makes this painful. If you just want to use the operators, then it is possible to write code that provides operator support via generics. I have some code for this that is freely available in MiscUtil (description and examples).
As an untyped example (an note that this isn't hugely efficient, but works):
object x = 123, y = 345; // now forget that we know that these are ints...
object result = Expression.Lambda<Func<object>>(
Expression.Convert(Expression.Add(
Expression.Constant(x), Expression.Constant(y)),
typeof(object))).Compile()();

It would be very inefficient if adding or comparing two integers required a method call so these simple operations on the fundamental types are code-generated as explained in another answer and cannot be invoked using reflection. One interesting built-in value type is decimal (or System.Decimal). This struct has support for literal values in C#, but behaves a lot like a real value type and exposes a lot of operators that can be invoked via reflection.
If you are curious you can use Reflector to browse all members exposed by System.Int32.

There are good answers here, so just to add something not mentioned yet:
A struct might have an operator overloaded. This means more options to check if you try to create some programmatic approach.
One nice thing to try out is, to try the expression tree approach. Here's a small sample. Of course, performance isn't too nice, but we know what we're getting into with Reflection, anyway.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.