There are two important steps to compiling a LINQ query in C#. The first is transforming LINQ query syntax into a chain of method calls, as described in section 7.16 of the C# language specification. This transformation process is specified in enough detail that a language developer can use it to implement similar query syntax on a new CLR language.
The second step is turning lambda expressions into expression trees, which takes place when calling a query method that returns IQueryable, but not when calling a method that returns IEnumerable. Is it ever specified how this transformation takes place, comparable to the explanation of the query syntax transformation process?
Construction of expression trees is explicitly not defined, actually. Compiler developers are free to do use whatever approach they wish, provided of course that executing the expression produces the same result as invoking the lambda.
This is a quote from the C# Language Specification:
6.5.2 Evaluation of anonymous function conversions to expression tree types
Conversion of an anonymous function to an expression tree type produces an expression tree (§4.6). More precisely, evaluation of the anonymous function conversion leads to the construction of an object structure that represents the structure of the anonymous function itself. The precise structure of the expression tree, as well as the exact process for creating it, are implementation defined.
I added the boldface at the end.
I suspect that this was deliberately left unspecified to give compiler developers the freedom to implement whatever optimizations they find helpful. A rigid specification of the expression tree would prevent that.
Is it ever specified how this transformation takes place, comparable
to the explanation of the query syntax transformation process?
As NSFW has stated, no.
On a practical, these expression trees can change from framework to framework. A real life example would be this:
We were using expression lambdas to get the property info through expression trees.
Such as void DoSomething<P, K>(P model, Expression<Func<P, K> propertySelector), and the usage DoSomething(model, m => m.Property)
The actual property interrogation was done inside DoSomething, through reflection. This is very classical, and variants of such code exists over the intenet.
Now, this is cool, it worked nicely in .NET 4.0. However, as soon as I tried 4.5, it blew up completely, as the underlying expression tree has changed.
You can be sure that Roslyn will introduce a lot new "bugs", in the sense that some client code relies on the representation how lambdas are translated to expression trees(If you really insist doing that - using Visitor class minimizes the chances of breaking).
Ensuring that the expression trees stay the same would be major task, and it would be also limiting(speed-wise for example)
Related
I am currently reading through the MSDN, Walkthrough: Creating an IQueryable LInQ Provider and there is a lot of use of the ExpressionVisitor.
It makes me wonder, is it an expensive operation to use this?
Is it as expensive as Reflection?
No, it should be quite cheap to traverse an expression tree with an ExpressionVisitor.
There is no run-time cost at all needed to parse the expression tree. The compiler does all the work of turning an Expression into an object tree at compile time. There isn't even much run-time reflection going on when the objects in question are created in memory. When you see a method call like:
SomeMethod(Foo x => x.Property);
and SomeMethod's argument is Expression typed, then the compiler converts the code into IL which acts like you had written something like this:
SomeMethod(new MemberExpression {
Expression = new ParameterExpression("x", typeof(Foo)),
Member = typeof(Foo).GetProperty("Property")
});
You can look at the generated IL for the full details, or see the worked example in Microsoft's documentation.
There is some reflection involved (for example MemberExpressions hold a PropertyInfo reference), but it is all quite fast.
If you have an app you are worried about, you should profile it (e.g. recent versions of Visual Studio have a built in performance profiler) and see which specific parts are slow.
To be sure that I understand could someone confirm that lambda expressions use reflection?
They can, but they are not required to. If they are compiled to a delegate, then no reflection is necessary: however, if they are compiled to ab expression tree, then an expression tree is absolutely reflection-based. Some parts of expression trees can be assembled directly from metadata tokens (ldtoken) - in particular methods (including operators and getters/setters, and types) - but some other parts cannot . This includes properties (PropertyInfo cannot be loaded by token) - so the IL for a compiled lambda can explicitly include GetProperty etc.
But however it is loaded (token or reflection), an expression tree is expressed in terms of reflection (MemberInfo, etc). This might later be compiled, or might be analysed by a provider.
To help performance, the expression compiler may cache some or all of the expression tree, and re-use it.
Lambda expressions are turned by the compiler into either anonymous functions or expression trees. Since reflection can only be performed at runtime it doesn't come into the picture at all when considering what the compiler does with them in either case.
At runtime, lambda expressions might result in reflection being used under very specific circumstances:
For anonymous functions: if you write an anonymous function that explicitly reflects on something then the lambda will perform that reflection when invoked. This is of course the same as if you were reflecting from within a "proper" method.
For expression trees (i.e. values of type Expression<TDelegate> for some TDelegate): using them at runtime when working with an IQueryable might result in reflection being used by the query provider. For example, assume that you do:
var user = new User { Id = 42 };
var posts = Queryable.Posts.Where(p => p.UserId == user.Id);
When posts is about to be materialized the query provider sees that it must find those posts with a UserId equal to the Id of the variable user. Since that id has a specific known value at runtime, the query provider needs to fish it out of user. One way it might decide to do that is through reflection.
A lambda expressions is just compiler syntax sugar for creating delegates. No reflection used here.
This is a follow up to a question I asked earlier seen here:
Confused about passing Expression vs. Func arguments
The accepted answerer there suggests refactoring an Expression referencing local objects into something that Linq to Entities can actually execute against the backing store (in my case SQL Server)
I've spent a long time trying to come up with something that will work for what I'm doing. My original
Func<Thing, bool> whereClause
was referencing a local Dictionary object which Linq to Entities or SQL could not understand at runtime. I tried refactoring into multiple lists which faked a dictionary, and Arrays after that. Each time, I got runtime errors complaining about the context doesn't recognize things like the methods on a List, or array indexers.
Eventually I gave up and just provided an additional method which takes a Func argument for when I cannot come up with the proper Expression.
I'm not trying to find a solution to my specific problem, I'm just wondering in general if it is always possible to convert, say a
Func<Thing, bool>
to an equivalent
Expression<Func<Thing, bool>>
which can run against Linq to Entities.
Or if there are many examples of querys where you simply must pull the data into memory first.
You don't convert a Func to an expression tree - the compiler converts a lambda expression to an expression tree... and no, that's not always possible. For example, you can't convert a statement lambda to an expression tree:
Expression<Func<string, int>> valid = text => text.Length;
Expression<Func<string, int>> invalid = text => { return text.Length; };
There are various other restrictions, too.
Even when you can create an expression tree (and if you do it manually you can build ones which the C# compiler wouldn't, particularly in .NET 4) that's not the same thing as the expression tree representing something that LINQ to SQL (etc) can translate appropriately.
Jon is of course correct; you turn a lambda into an expression tree.
To expand a bit on his "various other restrictions" handwave: a lambda converted to an expression tree may not contain:
statements
expressions useful primarily for their state mutations: assignment, compound assignment, increment and decrement operators
dynamic operations of any kind
multi-dimensional array initializers
removed partial methods
base access
pointer operations of any kind
sizeof(T) except for where T is a built-in type
COM-style indexed property invocations
COM-style "optional ref" invocations
C-style variadic method invocations
optional-argument and named-argument invocations
method groups (except of course when in an invocation)
That's not an exhaustive list; there are some other weird corner cases. But that should cover the majority of them.
Could you give me some reasons for limitations of the dynamic type in C#? I read about them in "Pro C# 2010 and the .NET 4 platform". Here is an excerpt (if quoting books is illegal here, tell me and I will remove the excerpt):
While a great many things can be
defined using the dynamic keyword,
there are some limitations regarding
its usage. While they are not show
stoppers, do know that a dynamic data
item cannot make use of lambda
expressions or C# anonymous methods
when calling a method. For example,
the following code will always result
in errors, even if the target method
does indeed take a delegate parameter
which takes a string value and returns
void.
dynamic a = GetDynamicObject();
// Error! Methods on dynamic data can’t use lambdas!
a.Method(arg => Console.WriteLine(arg));
To circumvent this restriction, you
will need to work with the underlying
delegate directly, using the
techniques described in Chapter 11
(anonymous methods and lambda
expressions, etc). Another limitation
is that a dynamic point of data cannot
understand any extension methods (see
Chapter 12). Unfortunately, this would
also include any of the extension
methods which come from the LINQ APIs.
Therefore, a variable declared with
the dynamic keyword has very limited
use within LINQ to Objects and other
LINQ technologies:
dynamic a = GetDynamicObject();
// Error! Dynamic data can’t find the Select() extension method!
var data = from d in a select d;
Thanks in advance.
Tomas's conjectures are pretty good. His reasoning on extension methods is spot on. Basically, to make extension methods work we need the call site to at runtime somehow know what using directives were in force at compile time. We simply did not have enough time or budget to develop a system whereby this information could be persisted into the call site.
For lambdas, the situation is actually more complex than the simple problem of determining whether the lambda is going to expression tree or delegate. Consider the following:
d.M(123)
where d is an expression of type dynamic. *What object should get passed at runtime as the argument to the call site "M"? Clearly we box 123 and pass that. Then the overload resolution algorithm in the runtime binder looks at the runtime type of d and the compile-time type of the int 123 and works with that.
Now what if it was
d.M(x=>x.Foo())
Now what object should we pass as the argument? We have no way to represent "lambda method of one variable that calls an unknown function called Foo on whatever the type of x turns out to be".
Suppose we wanted to implement this feature: what would we have to implement? First, we'd need a way to represent an unbound lambda. Expression trees are by design only for representing lambdas where all types and methods are known. We'd need to invent a new kind of "untyped" expression tree. And then we'd need to implement all of the rules for lambda binding in the runtime binder.
Consider that last point. Lambdas can contain statements. Implementing this feature requires that the runtime binder contain the entire semantic analyzer for every possible statement in C#.
That was orders of magnitude out of our budget. We'd still be working on C# 4 today if we'd wanted to implement that feature.
Unfortunately this means that LINQ doesn't work very well with dynamic, because LINQ of course uses untyped lambdas all over the place. Hopefully in some hypothetical future version of C# we will have a more fully-featured runtime binder and the ability to do homoiconic representations of unbound lambdas. But I wouldn't hold my breath waiting if I were you.
UPDATE: A comment asks for clarification on the point about the semantic analyzer.
Consider the following overloads:
class C {
public void M(Func<IDisposable, int> f) { ... }
public void M(Func<int, int> f) { ... }
...
}
and a call
d.M(x=> { using(x) { return 123; } });
Suppose d is of compile time type dynamic and runtime type C. What must the runtime binder do?
The runtime binder must determine at runtime whether the expression x=>{...} is convertible to each of the delegate types in each of the overloads of M.
In order to do that, the runtime binder must be able to determine that the second overload is not applicable. If it were applicable then you could have an int as the argument to a using statement, but the argument to a using statement must be disposable. That means that the runtime binder must know all the rules for the using statement and be able to correctly report whether any possible use of the using statement is legal or illegal.
Clearly that is not restricted to the using statement. The runtime binder must know all the rules for all of C# in order to determine whether a given statement lambda is convertible to a given delegate type.
We did not have time to write a runtime binder that was essentially an entire new C# compiler that generates DLR trees rather than IL. By not allowing lambdas we only have to write a runtime binder that knows how to bind method calls, arithmetic expressions and a few other simple kinds of call sites. Allowing lambdas makes the problem of runtime binding on the order of dozens or hundreds of times more expensive to implement, test and maintain.
Lambdas: I think that one reason for not supporting lambdas as parameters to dynamic objects is that the compiler wouldn't know whether to compile the lambda as a delegate or as an expression tree.
When you use a lambda, the compiler decides based on the type of the target parameter or variable. When it is Func<...> (or other delegate) it compiles the lambda as an executable delegate. When the target is Expression<...> it compiles lambda into an expression tree.
Now, when you have a dynamic type, you don't know whether the parameter is delegate or expression, so the compiler cannot decide what to do!
Extension methods: I think that the reason here is that finding extension methods at runtime would be quite difficult (and perhaps also inefficient). First of all, the runtime would need to know what namespaces were referenced using using. Then it would need to search all classes in all loaded assemblies, filter those that are accessible (by namespace) and then search those for extension methods...
Eric (and Tomas) says it well, but here is how I think of it.
This C# statement
a.Method(arg => Console.WriteLine(arg));
has no meaning without a lot of context. Lambda expressions themselves have no types, rather they are convertible to delegate (or Expression) types. So the only way to gather the meaning is to provide some context which forces the lambda to be converted to a specific delegate type. That context is typically (as in this example) overload resolution; given the type of a, and the available overloads Method on that type (including extension members), we can possibly place some context that gives the lambda meaning.
Without that context to produce the meaning, you end up having to bundle up all kinds of information about the lambda in the hopes of somehow binding the unknowns at runtime. (What IL could you possibly generate?)
In vast contrast, one you put a specific delegate type there,
a.Method(new Action<int>(arg => Console.WriteLine(arg)));
Kazam! Things just got easy. No matter what code is inside the lambda, we now know exactly what type it has, which means we can compile IL just as we would any method body (we now know, for example, which of the many overloads of Console.WriteLine we're calling). And that code has one specific type (Action<int>), which means it is easy for the runtime binder to see if a has a Method that takes that type of argument.
In C#, a naked lambda is almost meaningless. C# lambdas need static context to give them meaning and rule out ambiguities that arise from many possible coercisons and overloads. A typical program provides this context with ease, but the dynamic case lacks this important context.
I completely understand the concept of expression trees, but I am having a hard time trying to find situations in which they are useful. Is there a specific instance in which expression trees can be applied? Or is it only useful as a transport mechanism for code? I feel like I am missing something here. Thanks!
Some unit test mocking frameworks make use of expression trees in order to set up strongly typed expectations/verifications. Ie:
myMock.Verify(m => m.SomeMethod(someObject)); // tells moq to verify that the method
// SomeMethod was called with
// someObject as the argument
Here, the expression is never actually executed, but the expression itself holds the interesting information. The alternative without expression trees would be
myMock.Verify("SomeMethod", someObject) // we've lost the strong typing
Or is it only useful as a transport mechanism for code?
It's useful as an execution mechanism for code. Using the interpreter pattern, expression trees can directly be interpreted. This is useful because it's very easy and fast to implement. Such interpreters are ubiquitous and used even in cases that don't seem to “interpret” anything, e.g. for printing nested structures.
Expression trees are useful when you need to access function logic in order to alter or reapply it in some way.
Linq to SQL is a good example:
//a linq to sql statement
var recs (
from rec in LinqDataContext.Table
where rec.IntField > 5
select rec );
If we didn't have expression trees this statement would have to return all the records, and then apply the C# where logic to each.
With expression trees that where rec.IntField > 5 can be parsed into SQL:
--SQL statment executed
select *
from [table]
where [table].[IntField] > 5