Assume that there are two queries running on a memory list;
First query (employing extension methods):
var temp = listX.Where(q => q.SomeProperty == someValue);
Second query:
var temp = from o in listX
where o.SomeProperty == someValue
select o;
Is there a difference between two queries in terms of performance; and if there is, why?
No, there is no difference at all. The compiler internally transforms the second version to the first one.
The C# specification (§7.6.12) states:
The C# language does not specify the execution semantics of query
expressions. Rather, query expressions are translated into invocations
of methods that adhere to the query expression pattern (§7.16.3).
Specifically, query expressions are translated into invocations of
methods named Where, Select, SelectMany, Join, GroupJoin, OrderBy,
OrderByDescending, ThenBy, ThenByDescending, GroupBy, and Cast.These
methods are expected to have particular signatures and result types,
as described in §7.16.3. These methods can be instance methods of the
object being queried or extension methods that are external to the
object, and they implement the actual execution of the query.
The translation from query expressions to method invocations is a
syntactic mapping that occurs before any type binding or overload
resolution has been performed. The translation is guaranteed to be
syntactically correct, but it is not guaranteed to produce
semantically correct C# code. Following translation of query
expressions, the resulting method invocations are processed as regular
method invocations, and this may in turn uncover errors, for example
if the methods do not exist, if arguments have wrong types, or if the
methods are generic and type inference fails.
There aren't differences. It will produce the same result in the same time. It's basically the same code with different syntax.
Short question, short answer:
There is no difference. Both are the same, just written in different syntax.
See also the MSDN documentation for Query Syntax and Method Syntax.
Related
There are two important steps to compiling a LINQ query in C#. The first is transforming LINQ query syntax into a chain of method calls, as described in section 7.16 of the C# language specification. This transformation process is specified in enough detail that a language developer can use it to implement similar query syntax on a new CLR language.
The second step is turning lambda expressions into expression trees, which takes place when calling a query method that returns IQueryable, but not when calling a method that returns IEnumerable. Is it ever specified how this transformation takes place, comparable to the explanation of the query syntax transformation process?
Construction of expression trees is explicitly not defined, actually. Compiler developers are free to do use whatever approach they wish, provided of course that executing the expression produces the same result as invoking the lambda.
This is a quote from the C# Language Specification:
6.5.2 Evaluation of anonymous function conversions to expression tree types
Conversion of an anonymous function to an expression tree type produces an expression tree (§4.6). More precisely, evaluation of the anonymous function conversion leads to the construction of an object structure that represents the structure of the anonymous function itself. The precise structure of the expression tree, as well as the exact process for creating it, are implementation defined.
I added the boldface at the end.
I suspect that this was deliberately left unspecified to give compiler developers the freedom to implement whatever optimizations they find helpful. A rigid specification of the expression tree would prevent that.
Is it ever specified how this transformation takes place, comparable
to the explanation of the query syntax transformation process?
As NSFW has stated, no.
On a practical, these expression trees can change from framework to framework. A real life example would be this:
We were using expression lambdas to get the property info through expression trees.
Such as void DoSomething<P, K>(P model, Expression<Func<P, K> propertySelector), and the usage DoSomething(model, m => m.Property)
The actual property interrogation was done inside DoSomething, through reflection. This is very classical, and variants of such code exists over the intenet.
Now, this is cool, it worked nicely in .NET 4.0. However, as soon as I tried 4.5, it blew up completely, as the underlying expression tree has changed.
You can be sure that Roslyn will introduce a lot new "bugs", in the sense that some client code relies on the representation how lambdas are translated to expression trees(If you really insist doing that - using Visitor class minimizes the chances of breaking).
Ensuring that the expression trees stay the same would be major task, and it would be also limiting(speed-wise for example)
To be sure that I understand could someone confirm that lambda expressions use reflection?
They can, but they are not required to. If they are compiled to a delegate, then no reflection is necessary: however, if they are compiled to ab expression tree, then an expression tree is absolutely reflection-based. Some parts of expression trees can be assembled directly from metadata tokens (ldtoken) - in particular methods (including operators and getters/setters, and types) - but some other parts cannot . This includes properties (PropertyInfo cannot be loaded by token) - so the IL for a compiled lambda can explicitly include GetProperty etc.
But however it is loaded (token or reflection), an expression tree is expressed in terms of reflection (MemberInfo, etc). This might later be compiled, or might be analysed by a provider.
To help performance, the expression compiler may cache some or all of the expression tree, and re-use it.
Lambda expressions are turned by the compiler into either anonymous functions or expression trees. Since reflection can only be performed at runtime it doesn't come into the picture at all when considering what the compiler does with them in either case.
At runtime, lambda expressions might result in reflection being used under very specific circumstances:
For anonymous functions: if you write an anonymous function that explicitly reflects on something then the lambda will perform that reflection when invoked. This is of course the same as if you were reflecting from within a "proper" method.
For expression trees (i.e. values of type Expression<TDelegate> for some TDelegate): using them at runtime when working with an IQueryable might result in reflection being used by the query provider. For example, assume that you do:
var user = new User { Id = 42 };
var posts = Queryable.Posts.Where(p => p.UserId == user.Id);
When posts is about to be materialized the query provider sees that it must find those posts with a UserId equal to the Id of the variable user. Since that id has a specific known value at runtime, the query provider needs to fish it out of user. One way it might decide to do that is through reflection.
A lambda expressions is just compiler syntax sugar for creating delegates. No reflection used here.
This is a follow up to a question I asked earlier seen here:
Confused about passing Expression vs. Func arguments
The accepted answerer there suggests refactoring an Expression referencing local objects into something that Linq to Entities can actually execute against the backing store (in my case SQL Server)
I've spent a long time trying to come up with something that will work for what I'm doing. My original
Func<Thing, bool> whereClause
was referencing a local Dictionary object which Linq to Entities or SQL could not understand at runtime. I tried refactoring into multiple lists which faked a dictionary, and Arrays after that. Each time, I got runtime errors complaining about the context doesn't recognize things like the methods on a List, or array indexers.
Eventually I gave up and just provided an additional method which takes a Func argument for when I cannot come up with the proper Expression.
I'm not trying to find a solution to my specific problem, I'm just wondering in general if it is always possible to convert, say a
Func<Thing, bool>
to an equivalent
Expression<Func<Thing, bool>>
which can run against Linq to Entities.
Or if there are many examples of querys where you simply must pull the data into memory first.
You don't convert a Func to an expression tree - the compiler converts a lambda expression to an expression tree... and no, that's not always possible. For example, you can't convert a statement lambda to an expression tree:
Expression<Func<string, int>> valid = text => text.Length;
Expression<Func<string, int>> invalid = text => { return text.Length; };
There are various other restrictions, too.
Even when you can create an expression tree (and if you do it manually you can build ones which the C# compiler wouldn't, particularly in .NET 4) that's not the same thing as the expression tree representing something that LINQ to SQL (etc) can translate appropriately.
Jon is of course correct; you turn a lambda into an expression tree.
To expand a bit on his "various other restrictions" handwave: a lambda converted to an expression tree may not contain:
statements
expressions useful primarily for their state mutations: assignment, compound assignment, increment and decrement operators
dynamic operations of any kind
multi-dimensional array initializers
removed partial methods
base access
pointer operations of any kind
sizeof(T) except for where T is a built-in type
COM-style indexed property invocations
COM-style "optional ref" invocations
C-style variadic method invocations
optional-argument and named-argument invocations
method groups (except of course when in an invocation)
That's not an exhaustive list; there are some other weird corner cases. But that should cover the majority of them.
I am writing some code that takes a LINQ to SQL IQueryable<T> and adds further dynamically generated Where clauses. For example here is the skeleton of one of the methods:
IQueryable<T> ApplyContains(IQueryable<T> source, string field, string value)
{
Expression<Func<T, bool>> lambda;
... dynamically generate lambda for p => p.<field>.Contains(value) ...
return source.Where(lambda);
}
I might chain several of these methods together and finish off with a Skip/Take page.
Am I correct in thinking that when the IQueryable is finally evaluated if there is anything in the lambda expressions that can't be translated to SQL an exception will be thrown? In particular I'm concerned I might accidentally do something that would cause the IQueryable to evaluate early and then continue the evaluation in memory (thereby pulling in thousands of records).
From some things I've read I suspect IQueryable will not evaluate early like this. Can anyone confirm this please?
Yes you are correct in thinking that your IQueryable can throw an error at runtime if part of the expression can't be translated into SQL. Because of this I think it's a good idea to have your queries in a Business Layer class (like a data service or repository) and then make sure that query is covered by an automated test.
Regarding your Linq expression evaluating at an unexpected time, the basic rule to keep in mind is that your expression will evaluate whenever you call a foreach on it. This also includes methods that call a foreach behind the scenes like ToList() and FirstOrDefault().
BTW an easy way to tell if a method is going to call a foreach and force your lambda to evaluate is to check whether the return value on that method is an IQueryable. If the return value is another IQueryable then the method is probably just adding to the expression but not forcing it to evaluate. If the return value is a List<T>, an anonymous type, or anything that looks like data instead of an IQueryable then the method had to force your expression to evaluate to get that data.
Your thinking is correct.
As long as you pass the IQueryable an Expression in your Where clauses it will not evaluate unexpectedly.
Also, the extension methods beginning with "To" will cause evaluation (i.e. ToList(), ToArray()).
What is the quickest way to find out which .net framework linq methods (e.g .IEnumerable linq methods) are implemented using deferred execution vs. which are not implemented using deferred execution.
While coding many times, I wonder if this one will be executed right way. The only way to find out is go to MSDN documentation to make sure. Would there be any quicker way, any directory, any list somewhere on the web, any cheat sheet, any other trick up your sleeve that you can share? If yes, please do so. This will help many linq noobs (like me) to make fewer mistakes. The only other option is to check documentation until one have used them enough to remember (which is hard for me, I tend not to remember "anything" which is documented somewhere and can be looked up :D).
Generally methods that return a sequence use deferred execution:
IEnumerable<X> ---> Select ---> IEnumerable<Y>
and methods that return a single object doesn't:
IEnumerable<X> ---> First ---> Y
So, methods like Where, Select, Take, Skip, GroupBy and OrderBy use deferred execution because they can, while methods like First, Single, ToList and ToArray don't because they can't.
There are also two types of deferred execution. For example the Select method will only get one item at a time when it's asked to produce an item, while the OrderBy method will have to consume the entire source when asked to return the first item. So, if you chain an OrderBy after a Select, the execution will be deferred until you get the first item, but then the OrderBy will ask the Select for all the items.
The guidelines I use:
Always assume any API that returns IEnumerable<T> or IQueryable<T> can and probably will use deferred execution. If you're consuming such an API, and need to iterate through the results more than once (e.g. to get a Count), then convert to a collection before doing so (usually by calling the .ToList() extension method.
If you're exposing an enumeration, always expose it as a collection (ICollection<T> or IList<T>) if that is what your clients will normally use. For example, a data access layer will often return a collection of domain objects. Only expose IEnumerable<T> if deferred execution is a reasonable option for the API you're exposing.
Actually, there's more; in addition you need to consider buffered vs non-buffered. OrderBy can be deferred, but when iterated must consume the entire stream.
In general, anything in LINQ that returns IEnumerable tends to be deferred - while Min etc (which return values) are not deferred. The buffering (vs not) can usually be reasoned, but frankly reflector is a pretty quick way of finding out for sure. But note that often this is an implementation detail anyway.
For actual "deferred execution", you want methods that work on an IQueryable. Method chains based on an IQueryable work to build an expression tree representing your query. Only when you call a method that takes the IQueryable and produces a concrete or IEnumerable result (ToList() and similar, AsEnumerable(), etc) is the tree evaluated by the Linq provider (Linq2Objects is built into the Framework, as is Linq2SQL and now the MSEF; other ORMs and persistence-layer frameworks also offer Linq providers) and the actual result returned. Any IEnumerable class in the framework can be cast to an IQueryable using the AsQueryable() extension method, and Linq providers that will translate the expression tree, like ORMs, will provide an AsQueryable() as a jump-off point for a linq query against their data.
Even against an IEnumerable, some of the Linq methods are "lazy". Because the beauty of an IEnumerable is that you don't have to know about all of it, only the current element and whether there's another, Linq methods that act on an IEnumerable often return an iterator class that spits out an object from its source whenever methods later in the chain ask for one. Any operation that doesn't require knowledge of the entire set can be lazily evaluated (Select and Where are two big ones; there are others). Ones that do require knowing the entire collection (sorting via OrderBy, grouping with GroupBy, and aggregates like Min and Max) will slurp their entire source enumerable into a List or Array and work on it, forcing evaluation of all elements through all higher nodes. Generally, you want these to come late in a method chain if you can help it.
Here's a summary of different ways you can know if your query will be deferred or not:
If you're using query expression syntax instead of query method syntax, then it will be deferred.
If you're using query method syntax, it MIGHT be deferred depending on what it returns.
Hover over the var key word (if that's what you're using as the type for the variable used to store the query). If it says IEnumerable<T> then it'll be deferred.
Try to iterate over the query using a foreach. If you get an error saying it cannot iterate over your variable because it does not support GetEnumerator(), you know the query is not deferred.
Source: Essential Linq
If you cast the collection to an IQueryable using .AsQueryable(), your LINQ calls will use the deferred execution.
See here: Using IQueryable with Linq