To be sure that I understand could someone confirm that lambda expressions use reflection?
They can, but they are not required to. If they are compiled to a delegate, then no reflection is necessary: however, if they are compiled to ab expression tree, then an expression tree is absolutely reflection-based. Some parts of expression trees can be assembled directly from metadata tokens (ldtoken) - in particular methods (including operators and getters/setters, and types) - but some other parts cannot . This includes properties (PropertyInfo cannot be loaded by token) - so the IL for a compiled lambda can explicitly include GetProperty etc.
But however it is loaded (token or reflection), an expression tree is expressed in terms of reflection (MemberInfo, etc). This might later be compiled, or might be analysed by a provider.
To help performance, the expression compiler may cache some or all of the expression tree, and re-use it.
Lambda expressions are turned by the compiler into either anonymous functions or expression trees. Since reflection can only be performed at runtime it doesn't come into the picture at all when considering what the compiler does with them in either case.
At runtime, lambda expressions might result in reflection being used under very specific circumstances:
For anonymous functions: if you write an anonymous function that explicitly reflects on something then the lambda will perform that reflection when invoked. This is of course the same as if you were reflecting from within a "proper" method.
For expression trees (i.e. values of type Expression<TDelegate> for some TDelegate): using them at runtime when working with an IQueryable might result in reflection being used by the query provider. For example, assume that you do:
var user = new User { Id = 42 };
var posts = Queryable.Posts.Where(p => p.UserId == user.Id);
When posts is about to be materialized the query provider sees that it must find those posts with a UserId equal to the Id of the variable user. Since that id has a specific known value at runtime, the query provider needs to fish it out of user. One way it might decide to do that is through reflection.
A lambda expressions is just compiler syntax sugar for creating delegates. No reflection used here.
Related
There are two important steps to compiling a LINQ query in C#. The first is transforming LINQ query syntax into a chain of method calls, as described in section 7.16 of the C# language specification. This transformation process is specified in enough detail that a language developer can use it to implement similar query syntax on a new CLR language.
The second step is turning lambda expressions into expression trees, which takes place when calling a query method that returns IQueryable, but not when calling a method that returns IEnumerable. Is it ever specified how this transformation takes place, comparable to the explanation of the query syntax transformation process?
Construction of expression trees is explicitly not defined, actually. Compiler developers are free to do use whatever approach they wish, provided of course that executing the expression produces the same result as invoking the lambda.
This is a quote from the C# Language Specification:
6.5.2 Evaluation of anonymous function conversions to expression tree types
Conversion of an anonymous function to an expression tree type produces an expression tree (§4.6). More precisely, evaluation of the anonymous function conversion leads to the construction of an object structure that represents the structure of the anonymous function itself. The precise structure of the expression tree, as well as the exact process for creating it, are implementation defined.
I added the boldface at the end.
I suspect that this was deliberately left unspecified to give compiler developers the freedom to implement whatever optimizations they find helpful. A rigid specification of the expression tree would prevent that.
Is it ever specified how this transformation takes place, comparable
to the explanation of the query syntax transformation process?
As NSFW has stated, no.
On a practical, these expression trees can change from framework to framework. A real life example would be this:
We were using expression lambdas to get the property info through expression trees.
Such as void DoSomething<P, K>(P model, Expression<Func<P, K> propertySelector), and the usage DoSomething(model, m => m.Property)
The actual property interrogation was done inside DoSomething, through reflection. This is very classical, and variants of such code exists over the intenet.
Now, this is cool, it worked nicely in .NET 4.0. However, as soon as I tried 4.5, it blew up completely, as the underlying expression tree has changed.
You can be sure that Roslyn will introduce a lot new "bugs", in the sense that some client code relies on the representation how lambdas are translated to expression trees(If you really insist doing that - using Visitor class minimizes the chances of breaking).
Ensuring that the expression trees stay the same would be major task, and it would be also limiting(speed-wise for example)
This is all about the Compile method of Expression Type. Sorry being naïve since I am a late comer. I have been reading about building expression in order to enables dynamic modification of executable code. And It make sense for me when it comes to emitting lambda expression from a given expression tree as long as for the varying inputs/environment (say for different values for any given constant/parameter/member expression). I presume, it would be ideal if I could cache (re-use) lambda those are generated/compiled from an expression tree provided there is no change in the environment.
Question: Does CLR always emit lambda expression even if I have there is no change in the environment? If so, what is the best was to avoid compilation of expression from lambda if there is no change in the environment?
CLR doesn't cache lambda expressions, Compile() each time returns a new delegate.
But it should be easy to cache, via something like this:
public Func<T> Get<T>(Expression<Func<T>> expression)
{
string key = expression.Body.ToString();
Func<T> result;
if (!_cache.TryGetValue(key, out result)) {
result = expression.Compile();
_cache.Add(key, result);
}
return result;
}
Lambda expressions is just a way to represent a piece of code: call this, call that, compare these arguments, return something, etc. Almost the same way you do it from code editor, or JIT does from IL.
Compile emits a delegate from particular lambda expression. Once you have compiled lambda into delegate, the delegate remains unchanged (the lambda remains unchanged too, because it is immutable).
This doesn't mean, that delegate can't accept any arguments or call any method of any object. This just means, that delegate's IL doesn't change. And yes, you can cache compiled delegate instance.
Calling Compile() will return a new delegate each time, and each time new MSIL code is emitted. This is slow and effectively creates a memory leak, since MSIL code is not subject to garbage collection. I created a library that offers cached compilation, which in fact compares the structure of the expressions correctly and enables reusing the cached delegates. All constants and closures are automatically replaced by parameters and get re-inserted into the delegate in an outer closure. That avoids the leaking memory and is much faster. Check it out here: https://github.com/Miaplaza/expression-utils
Assume that there are two queries running on a memory list;
First query (employing extension methods):
var temp = listX.Where(q => q.SomeProperty == someValue);
Second query:
var temp = from o in listX
where o.SomeProperty == someValue
select o;
Is there a difference between two queries in terms of performance; and if there is, why?
No, there is no difference at all. The compiler internally transforms the second version to the first one.
The C# specification (§7.6.12) states:
The C# language does not specify the execution semantics of query
expressions. Rather, query expressions are translated into invocations
of methods that adhere to the query expression pattern (§7.16.3).
Specifically, query expressions are translated into invocations of
methods named Where, Select, SelectMany, Join, GroupJoin, OrderBy,
OrderByDescending, ThenBy, ThenByDescending, GroupBy, and Cast.These
methods are expected to have particular signatures and result types,
as described in §7.16.3. These methods can be instance methods of the
object being queried or extension methods that are external to the
object, and they implement the actual execution of the query.
The translation from query expressions to method invocations is a
syntactic mapping that occurs before any type binding or overload
resolution has been performed. The translation is guaranteed to be
syntactically correct, but it is not guaranteed to produce
semantically correct C# code. Following translation of query
expressions, the resulting method invocations are processed as regular
method invocations, and this may in turn uncover errors, for example
if the methods do not exist, if arguments have wrong types, or if the
methods are generic and type inference fails.
There aren't differences. It will produce the same result in the same time. It's basically the same code with different syntax.
Short question, short answer:
There is no difference. Both are the same, just written in different syntax.
See also the MSDN documentation for Query Syntax and Method Syntax.
This is a follow up to a question I asked earlier seen here:
Confused about passing Expression vs. Func arguments
The accepted answerer there suggests refactoring an Expression referencing local objects into something that Linq to Entities can actually execute against the backing store (in my case SQL Server)
I've spent a long time trying to come up with something that will work for what I'm doing. My original
Func<Thing, bool> whereClause
was referencing a local Dictionary object which Linq to Entities or SQL could not understand at runtime. I tried refactoring into multiple lists which faked a dictionary, and Arrays after that. Each time, I got runtime errors complaining about the context doesn't recognize things like the methods on a List, or array indexers.
Eventually I gave up and just provided an additional method which takes a Func argument for when I cannot come up with the proper Expression.
I'm not trying to find a solution to my specific problem, I'm just wondering in general if it is always possible to convert, say a
Func<Thing, bool>
to an equivalent
Expression<Func<Thing, bool>>
which can run against Linq to Entities.
Or if there are many examples of querys where you simply must pull the data into memory first.
You don't convert a Func to an expression tree - the compiler converts a lambda expression to an expression tree... and no, that's not always possible. For example, you can't convert a statement lambda to an expression tree:
Expression<Func<string, int>> valid = text => text.Length;
Expression<Func<string, int>> invalid = text => { return text.Length; };
There are various other restrictions, too.
Even when you can create an expression tree (and if you do it manually you can build ones which the C# compiler wouldn't, particularly in .NET 4) that's not the same thing as the expression tree representing something that LINQ to SQL (etc) can translate appropriately.
Jon is of course correct; you turn a lambda into an expression tree.
To expand a bit on his "various other restrictions" handwave: a lambda converted to an expression tree may not contain:
statements
expressions useful primarily for their state mutations: assignment, compound assignment, increment and decrement operators
dynamic operations of any kind
multi-dimensional array initializers
removed partial methods
base access
pointer operations of any kind
sizeof(T) except for where T is a built-in type
COM-style indexed property invocations
COM-style "optional ref" invocations
C-style variadic method invocations
optional-argument and named-argument invocations
method groups (except of course when in an invocation)
That's not an exhaustive list; there are some other weird corner cases. But that should cover the majority of them.
Could you give me some reasons for limitations of the dynamic type in C#? I read about them in "Pro C# 2010 and the .NET 4 platform". Here is an excerpt (if quoting books is illegal here, tell me and I will remove the excerpt):
While a great many things can be
defined using the dynamic keyword,
there are some limitations regarding
its usage. While they are not show
stoppers, do know that a dynamic data
item cannot make use of lambda
expressions or C# anonymous methods
when calling a method. For example,
the following code will always result
in errors, even if the target method
does indeed take a delegate parameter
which takes a string value and returns
void.
dynamic a = GetDynamicObject();
// Error! Methods on dynamic data can’t use lambdas!
a.Method(arg => Console.WriteLine(arg));
To circumvent this restriction, you
will need to work with the underlying
delegate directly, using the
techniques described in Chapter 11
(anonymous methods and lambda
expressions, etc). Another limitation
is that a dynamic point of data cannot
understand any extension methods (see
Chapter 12). Unfortunately, this would
also include any of the extension
methods which come from the LINQ APIs.
Therefore, a variable declared with
the dynamic keyword has very limited
use within LINQ to Objects and other
LINQ technologies:
dynamic a = GetDynamicObject();
// Error! Dynamic data can’t find the Select() extension method!
var data = from d in a select d;
Thanks in advance.
Tomas's conjectures are pretty good. His reasoning on extension methods is spot on. Basically, to make extension methods work we need the call site to at runtime somehow know what using directives were in force at compile time. We simply did not have enough time or budget to develop a system whereby this information could be persisted into the call site.
For lambdas, the situation is actually more complex than the simple problem of determining whether the lambda is going to expression tree or delegate. Consider the following:
d.M(123)
where d is an expression of type dynamic. *What object should get passed at runtime as the argument to the call site "M"? Clearly we box 123 and pass that. Then the overload resolution algorithm in the runtime binder looks at the runtime type of d and the compile-time type of the int 123 and works with that.
Now what if it was
d.M(x=>x.Foo())
Now what object should we pass as the argument? We have no way to represent "lambda method of one variable that calls an unknown function called Foo on whatever the type of x turns out to be".
Suppose we wanted to implement this feature: what would we have to implement? First, we'd need a way to represent an unbound lambda. Expression trees are by design only for representing lambdas where all types and methods are known. We'd need to invent a new kind of "untyped" expression tree. And then we'd need to implement all of the rules for lambda binding in the runtime binder.
Consider that last point. Lambdas can contain statements. Implementing this feature requires that the runtime binder contain the entire semantic analyzer for every possible statement in C#.
That was orders of magnitude out of our budget. We'd still be working on C# 4 today if we'd wanted to implement that feature.
Unfortunately this means that LINQ doesn't work very well with dynamic, because LINQ of course uses untyped lambdas all over the place. Hopefully in some hypothetical future version of C# we will have a more fully-featured runtime binder and the ability to do homoiconic representations of unbound lambdas. But I wouldn't hold my breath waiting if I were you.
UPDATE: A comment asks for clarification on the point about the semantic analyzer.
Consider the following overloads:
class C {
public void M(Func<IDisposable, int> f) { ... }
public void M(Func<int, int> f) { ... }
...
}
and a call
d.M(x=> { using(x) { return 123; } });
Suppose d is of compile time type dynamic and runtime type C. What must the runtime binder do?
The runtime binder must determine at runtime whether the expression x=>{...} is convertible to each of the delegate types in each of the overloads of M.
In order to do that, the runtime binder must be able to determine that the second overload is not applicable. If it were applicable then you could have an int as the argument to a using statement, but the argument to a using statement must be disposable. That means that the runtime binder must know all the rules for the using statement and be able to correctly report whether any possible use of the using statement is legal or illegal.
Clearly that is not restricted to the using statement. The runtime binder must know all the rules for all of C# in order to determine whether a given statement lambda is convertible to a given delegate type.
We did not have time to write a runtime binder that was essentially an entire new C# compiler that generates DLR trees rather than IL. By not allowing lambdas we only have to write a runtime binder that knows how to bind method calls, arithmetic expressions and a few other simple kinds of call sites. Allowing lambdas makes the problem of runtime binding on the order of dozens or hundreds of times more expensive to implement, test and maintain.
Lambdas: I think that one reason for not supporting lambdas as parameters to dynamic objects is that the compiler wouldn't know whether to compile the lambda as a delegate or as an expression tree.
When you use a lambda, the compiler decides based on the type of the target parameter or variable. When it is Func<...> (or other delegate) it compiles the lambda as an executable delegate. When the target is Expression<...> it compiles lambda into an expression tree.
Now, when you have a dynamic type, you don't know whether the parameter is delegate or expression, so the compiler cannot decide what to do!
Extension methods: I think that the reason here is that finding extension methods at runtime would be quite difficult (and perhaps also inefficient). First of all, the runtime would need to know what namespaces were referenced using using. Then it would need to search all classes in all loaded assemblies, filter those that are accessible (by namespace) and then search those for extension methods...
Eric (and Tomas) says it well, but here is how I think of it.
This C# statement
a.Method(arg => Console.WriteLine(arg));
has no meaning without a lot of context. Lambda expressions themselves have no types, rather they are convertible to delegate (or Expression) types. So the only way to gather the meaning is to provide some context which forces the lambda to be converted to a specific delegate type. That context is typically (as in this example) overload resolution; given the type of a, and the available overloads Method on that type (including extension members), we can possibly place some context that gives the lambda meaning.
Without that context to produce the meaning, you end up having to bundle up all kinds of information about the lambda in the hopes of somehow binding the unknowns at runtime. (What IL could you possibly generate?)
In vast contrast, one you put a specific delegate type there,
a.Method(new Action<int>(arg => Console.WriteLine(arg)));
Kazam! Things just got easy. No matter what code is inside the lambda, we now know exactly what type it has, which means we can compile IL just as we would any method body (we now know, for example, which of the many overloads of Console.WriteLine we're calling). And that code has one specific type (Action<int>), which means it is easy for the runtime binder to see if a has a Method that takes that type of argument.
In C#, a naked lambda is almost meaningless. C# lambdas need static context to give them meaning and rule out ambiguities that arise from many possible coercisons and overloads. A typical program provides this context with ease, but the dynamic case lacks this important context.