Is it expensive to parse an ExpressionTree? - c#

I am currently reading through the MSDN, Walkthrough: Creating an IQueryable LInQ Provider and there is a lot of use of the ExpressionVisitor.
It makes me wonder, is it an expensive operation to use this?
Is it as expensive as Reflection?

No, it should be quite cheap to traverse an expression tree with an ExpressionVisitor.
There is no run-time cost at all needed to parse the expression tree. The compiler does all the work of turning an Expression into an object tree at compile time. There isn't even much run-time reflection going on when the objects in question are created in memory. When you see a method call like:
SomeMethod(Foo x => x.Property);
and SomeMethod's argument is Expression typed, then the compiler converts the code into IL which acts like you had written something like this:
SomeMethod(new MemberExpression {
Expression = new ParameterExpression("x", typeof(Foo)),
Member = typeof(Foo).GetProperty("Property")
});
You can look at the generated IL for the full details, or see the worked example in Microsoft's documentation.
There is some reflection involved (for example MemberExpressions hold a PropertyInfo reference), but it is all quite fast.
If you have an app you are worried about, you should profile it (e.g. recent versions of Visual Studio have a built in performance profiler) and see which specific parts are slow.

Related

C# - Fastest way to do this type of null check?

I have the following code that performs a null or empty check on any type of object:
public static void IfNullOrEmpty(Expression<Func<string>> parameter)
{
Throw.IfNull(parameter);
if (parameter.GetValue().ToString().Length == 0)
{
throw new ArgumentException("Cannot be empty", parameter.GetName());
}
}
It calls the GetValue extension method below:
public static T GetValue<T>(this Expression<Func<T>> parameter)
{
MemberExpression member;
Expression expression;
member = (MemberExpression)parameter.Body;
expression = member.Expression;
return (T)parameter.Compile()();
}
I am passing in an expression containing a string in this method for testing. This method takes on average 2 ms on my machine (even slower on another machine I'm testing on) which adds up if it is called several times throughout the application. It seems like this method is too slow. What is the fastest way to do this type of null check?
Compiling an expression naturally requires quite some work. What I normally do if this code will run often is that I only compile the expressions once and save the compiled delegate for further usage.
It's possible to keep a "normal" cache but for a cache to be efficient you need a good hash function and I don't see how you could make that here. You need to restructure your code a bit so that every place where you use GetValue has a proper access to a compiled delegate instead. Without seeing more code I can't give you any hints about that one.
There can be many reasons why you see the following call being faster. Because of the difficulty to hash I don't expect that one. More likely you are seeing the works of a modern CPU that does a lot of guessing to run code fast. If you just ran the same expressions it's possible that the CPU is able to to guess more about the next call and can run faster. There is always GC to consider too.
One way to test the guessing idea could be to create a large array with a few different expressions. Do one test where it is ordered by expression and one where it is random. If my suspicion holds true the first one should be faster.
If I'm reading your code right, the only reason why you need an Expression is so if the check fails, you'll be able to extract the name of the parameter and pass it into the exception that you'll throw, right? If so, that's an awfully steep price to pay for a slightly more convenient error message that (hopefully) only occurs in a very tiny percentage of cases.
The 2ms overhead is slightly higher than I'd expect, but substantial overhead is hard or impossible to avoid here. You're essentially forcing the runtime to traverse an expression tree, translate it into MSIL and then translate and optimize that MSIL again into executable code through the JIT, just to do a != null check which will always succeed unless a developer has made a mistake somewhere.
You could probably come up with some sort of caching mechanism; the Entity Framework caches expressions by traversing the expression tree and building a hash of it and uses that as the key of a dictionary where the compiled expression is stored as a delegate. It'd be substantially cheaper than sending it through the JIT on every call, but it'd still be orders of magnitude more expensive when compared to a simpler != null check which takes nanoseconds or less (especially when you consider modern branch-predicting CPUs).
So in my opinion, this approach is a nice idea, but simply not worth it when you consider the cost and when the alternative is pretty painless (especially with the new nameof operator). It's also fairly brittle, because what if I'm a developer who thinks he can do this:
Throw.IfNullOrEmpty(() => clientId + "something")
Your cast to MemberExpression will fail there.
Likewise, it would be reasonable for someone to think that because you pass in an expression and an expression is only code as data that it would be safe to do this:
Throw.IfNullOrEmpty(() => parent.Child.MightBeNull.MightAlsoBeNull.ClientID)
It's perfectly possible to safely evaluate that expression if you partially traversed the expression tree, but in your example the whole expression is compiled and executed at once and will likely fail with a NullReferenceException there.
I guess it comes down to that an argument of type Expression<Func<T>> is just not sufficiently strict enough for the use of a null check. You could do all sorts of weird stuff in them and the compiler would be perfectly happy, but you'd get unexpected results at runtime.

Is the lambda->expression tree transformation process specified anywhere?

There are two important steps to compiling a LINQ query in C#. The first is transforming LINQ query syntax into a chain of method calls, as described in section 7.16 of the C# language specification. This transformation process is specified in enough detail that a language developer can use it to implement similar query syntax on a new CLR language.
The second step is turning lambda expressions into expression trees, which takes place when calling a query method that returns IQueryable, but not when calling a method that returns IEnumerable. Is it ever specified how this transformation takes place, comparable to the explanation of the query syntax transformation process?
Construction of expression trees is explicitly not defined, actually. Compiler developers are free to do use whatever approach they wish, provided of course that executing the expression produces the same result as invoking the lambda.
This is a quote from the C# Language Specification:
6.5.2 Evaluation of anonymous function conversions to expression tree types
Conversion of an anonymous function to an expression tree type produces an expression tree (§4.6). More precisely, evaluation of the anonymous function conversion leads to the construction of an object structure that represents the structure of the anonymous function itself. The precise structure of the expression tree, as well as the exact process for creating it, are implementation defined.
I added the boldface at the end.
I suspect that this was deliberately left unspecified to give compiler developers the freedom to implement whatever optimizations they find helpful. A rigid specification of the expression tree would prevent that.
Is it ever specified how this transformation takes place, comparable
to the explanation of the query syntax transformation process?
As NSFW has stated, no.
On a practical, these expression trees can change from framework to framework. A real life example would be this:
We were using expression lambdas to get the property info through expression trees.
Such as void DoSomething<P, K>(P model, Expression<Func<P, K> propertySelector), and the usage DoSomething(model, m => m.Property)
The actual property interrogation was done inside DoSomething, through reflection. This is very classical, and variants of such code exists over the intenet.
Now, this is cool, it worked nicely in .NET 4.0. However, as soon as I tried 4.5, it blew up completely, as the underlying expression tree has changed.
You can be sure that Roslyn will introduce a lot new "bugs", in the sense that some client code relies on the representation how lambdas are translated to expression trees(If you really insist doing that - using Visitor class minimizes the chances of breaking).
Ensuring that the expression trees stay the same would be major task, and it would be also limiting(speed-wise for example)

Should type information be encoded in parse tree?

I am working on a project including a small DSL. Lexing and parsing a string in this language results in a parse tree, implemented as an abstract class called Expr, which then has many of the usual derived classes such as AssignmentExpr, InvokeExpr, AdditionExpr, et cetera, corresponding to parse tree nodes which are assignments, function invocations, additions and so forth. The project is implemented in C#.
I am currently considering the implementation of type inference for this DSL. This means that I would like to be able to take an instance of the Expr class and return something encoding information about the types of the different nodes in the tree. This type information depends on a symbol table (types of variables) and a function table (function signatures). Thus, I would like to do something like:
TypedExpr typedExpr = inferTypes(expr, symbolTable, functionTable)
Here, TypedExpr would ideally be like Expr, except with a Type property giving the type of the expression. This, however, presents the following design problems:
It would make sense for TypedExpr to inherit from Expr and simply implement an additional property, Type. However, this would create two parallel inheritance hierarchies, one for TypedExpr (TypedAssignmentExpr, TypedInvokeExpr et cetera) and one for Expr (AssignmentExpr, InvokeExpr, et cetera). This is inconvenient to maintain, and the problem expands if further extensions of parse trees are required. I am not sure how this can be mitigated. One possibility would be the bridge design pattern, but I don't think this is capable of entirely solving the problem.
Alternatively, Expr could simply implement a Type property, which is then null at the time of construction from the parser, and later filled out by the type inference algorithm. However, passing around objects with null fields invites NullReferenceExceptions. The TypedExpr idea would have mitigated this. Furthermore, given that the idea of the Expr class is to express a parse tree, type information is not really a part of the tree: typing is context-sensitive, and requires particular symbol and function tables.
Third, the type inference method could also simply return a Dictionary< Expr, Type> which encodes type information about all nodes. This would mean that Expr remains representative of just the parse tree. The drawback of this is that the dictionary object constructed does not have any obvious properties showing that it is linked specifically to the Expr object passed to the type inference method.
I am not entirely satisfied with either of the three solutions given above.
My question is: What are the benefits and drawbacks of various approaches to this problem? Should type information be encoded directly in the parse tree, or should a parallel tree class be used? Or is the Dictionary solution the best? Is there an accepted "best practice" solution?
Go ahead with option two. This is what can be considered a “best practice”.
The reason is that a compiler usually works in many passes (stages, phases). Parsing being the first one, type resolution another one. You can later add an optimization pass, a code generation pass etc. Usually, a single data structure, an abstract syntax tree (AST; or parse tree), is maintianed allong these passes.
The idea that “passing around objects with null fields invites NullReferenceExceptions” is just false worries. You have to handle invalid cases a introduce counter-measures to validate inputs / outputs anyway. Compilers, including simple expression processors, are pretty complex things driven by complicated rules, which involve high degrees of data structure complexity and application logic you can't simply avoid.
It is very normal for an AST to have uninitialized data. Each compilation pass, besides initial construction of the AST by the parser, then manipulates the AST, computes more information (like your type resolution phase). The AST may even change substantially, i.e. due to an optimization pass.
Side note: modern compilers, such as the latest C# compiler, employ a non-mutability policy over ASTs and other internal data structures. In that case each pass builds its own new data structure. You could then design a new set of data structures for each pass, but that may turn into an overly complex code to maintain. Someone from the C# compiler team could elaborate more on this topic.

when to use or not Lambda Expressions

I see lambda expressions have become a very useful tool at some points in the language. I've been using them a lot and most of the time they fit really nice and make the code shorter and perhaps clearer.
Now.. I've seen some , I would say excessive use of them. Some people like them so much that try to use them everywhere they can.. Some times the C# code looks like a functional language.
Other factors against are the cost using reflection by lambda and that not friendly to debugging.
I would like to hear opinions about how good and how code clear it is to use more or less the lambda expressions.
(this is not the better example, but let's say it was the trigger)
I was writing the following code. The use of the delegate { return null; } helps me avoid having to ask if the event is null or not every time I have to use it.
public delegate ContactCellInfo.Guest AddGuest();
public event AddGuest GuestRequest = delegate { return null;}
Im using resharper and the wise resharper( even it some times literaly eats the memory) made me the following suggestion
public delegate ContactCellInfo.Guest AddGuest();
public event AddGuest GuestRequest = () => null;
At my point of view the code using the delegate looks clearer. I am not against the Lamdba expression just would like to hear some advices on how and when to use them.
There are somewhat two questions here.
First, as for your example, using a lambda vs. using the anonymous delegate syntax. The generated code by the compiler will be identical, so it does not come down to a performance difference, but rather a readability difference.
Personally, I find the lambda syntax easy to follow. I find that the lambda syntax is almost always cleaner, more concise, and more understandable than the anonymous delegate syntax, so I prefer it nearly always.
As for using lambda expressions throughout the code - Personally, I am a fairly heavy user of them. I find that they often make life much easier than having lots of methods defined. If a piece of code is not going to be reused by any other methods (it will only be called and exist in one place), I will use a lambda to express it.
If a piece of code is going to be used more than once, it should be pulled out into a (non-anonymous) method. Also, if a piece of code is something that could and should be tested, I tend to make a method for it, since that eases testability.

When not to use lambda expressions [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
A lot of questions are being answered on Stack Overflow, with members specifying how to solve these real world/time problems using lambda expressions.
Are we overusing it, and are we considering the performance impact of using lambda expressions?
I found a few articles that explores the performance impact of lambda vs anonymous delegates vs for/foreach loops with different results
Anonymous Delegates vs Lambda Expressions vs Function Calls Performance
Performance of foreach vs. List.ForEach
.NET/C# Loop Performance Test (FOR, FOREACH, LINQ, & Lambda).
DataTable.Select is faster than LINQ
What should be the evaluation criteria when choosing the appropriate solution? Except for the obvious reason that it's more concise code and readable when using lambda.
Even though I will focus on point one, I begin by giving my 2 cents on the whole issue of performance. Unless differences are big or usage is intensive, usually I don't bother about microseconds that when added don't amount to any visible difference to the user. I emphasize that I only don't care when considering non-intensive called methods. Where I do have special performance considerations is on the way I design the application itself. I care about caching, about the use of threads, about clever ways to call methods (whether to make several calls or to try to make only one call), whether to pool connections or not, etc., etc. In fact I usually don't focus on raw performance, but on scalibility. I don't care if it runs better by a tiny slice of a nanosecond for a single user, but I care a lot to have the ability to load the system with big amounts of simultaneous users without noticing the impact.
Having said that, here goes my opinion about point 1. I love anonymous methods. They give me great flexibility and code elegance. The other great feature about anonymous methods is that they allow me to directly use local variables from the container method (from a C# perspective, not from an IL perspective, of course). They spare me loads of code oftentimes. When do I use anonymous methods? Evey single time the piece of code I need isn't needed elsewhere. If it is used in two different places, I don't like copy-paste as a reuse technique, so I'll use a plain ol' delegate. So, just like shoosh answered, it isn't good to have code duplication. In theory there are no performance differences as anonyms are C# tricks, not IL stuff.
Most of what I think about anonymous methods applies to lambda expressions, as the latter can be used as a compact syntax to represent anonymous methods. Let's assume the following method:
public static void DoSomethingMethod(string[] names, Func<string, bool> myExpression)
{
Console.WriteLine("Lambda used to represent an anonymous method");
foreach (var item in names)
{
if (myExpression(item))
Console.WriteLine("Found {0}", item);
}
}
It receives an array of strings and for each one of them, it will call the method passed to it. If that method returns true, it will say "Found...". You can call this method the following way:
string[] names = {"Alice", "Bob", "Charles"};
DoSomethingMethod(names, delegate(string p) { return p == "Alice"; });
But, you can also call it the following way:
DoSomethingMethod(names, p => p == "Alice");
There is no difference in IL between the both, being that the one using the Lambda expression is much more readable. Once again, there is no performance impact as these are all C# compiler tricks (not JIT compiler tricks). Just as I didn't feel we are overusing anonymous methods, I don't feel we are overusing Lambda expressions to represent anonymous methods. Of course, the same logic applies to repeated code: Don't do lambdas, use regular delegates. There are other restrictions leading you back to anonymous methods or plain delegates, like out or ref argument passing.
The other nice things about Lambda expressions is that the exact same syntax doesn't need to represent an anonymous method. Lambda expressions can also represent... you guessed, expressions. Take the following example:
public static void DoSomethingExpression(string[] names, System.Linq.Expressions.Expression<Func<string, bool>> myExpression)
{
Console.WriteLine("Lambda used to represent an expression");
BinaryExpression bExpr = myExpression.Body as BinaryExpression;
if (bExpr == null)
return;
Console.WriteLine("It is a binary expression");
Console.WriteLine("The node type is {0}", bExpr.NodeType.ToString());
Console.WriteLine("The left side is {0}", bExpr.Left.NodeType.ToString());
Console.WriteLine("The right side is {0}", bExpr.Right.NodeType.ToString());
if (bExpr.Right.NodeType == ExpressionType.Constant)
{
ConstantExpression right = (ConstantExpression)bExpr.Right;
Console.WriteLine("The value of the right side is {0}", right.Value.ToString());
}
}
Notice the slightly different signature. The second parameter receives an expression and not a delegate. The way to call this method would be:
DoSomethingExpression(names, p => p == "Alice");
Which is exactly the same as the call we made when creating an anonymous method with a lambda. The difference here is that we are not creating an anonymous method, but creating an expression tree. It is due to these expression trees that we can then translate lambda expressions to SQL, which is what Linq 2 SQL does, for instance, instead of executing stuff in the engine for each clause like the Where, the Select, etc. The nice thing is that the calling syntax is the same whether you're creating an anonymous method or sending an expression.
My answer will not be popular.
I believe Lambda's are 99% always the better choice for three reasons.
First, there is ABSOLUTELY nothing wrong with assuming your developers are smart. Other answers have an underlying premise that every developer but you is stupid. Not so.
Second, Lamdas (et al) are a modern syntax - and tomorrow they will be more commonplace than they already are today. Your project's code should flow from current and emerging conventions.
Third, writing code "the old fashioned way" might seem easier to you, but it's not easier to the compiler. This is important, legacy approaches have little opportunity to be improved as the compiler is rev'ed. Lambdas (et al) which rely on the compiler to expand them can benefit as the compiler deals with them better over time.
To sum up:
Developers can handle it
Everyone is doing it
There's future potential
Again, I know this will not be a popular answer. And believe me "Simple is Best" is my mantra, too. Maintenance is an important aspect to any source. I get it. But I think we are overshadowing reality with some cliché rules of thumb.
// Jerry
Code duplication.
If you find yourself writing the same anonymous function more than once, it shouldn't be one.
Well, when we are talking bout delegate usage, there shouldn't be any difference between lambda and anonymous methods -- they are the same, just with different syntax. And named methods (used as delegates) are also identical from the runtime's viewpoint. The difference, then, is between using delegates, vs. inline code - i.e.
list.ForEach(s=>s.Foo());
// vs.
foreach(var s in list) { s.Foo(); }
(where I would expect the latter to be quicker)
And equally, if you are talking about anything other than in-memory objects, lambdas are one of your most powerful tools in terms of maintaining type checking (rather than parsing strings all the time).
Certainly, there are cases when a simple foreach with code will be faster than the LINQ version, as there will be fewer invokes to do, and invokes cost a small but measurable time. However, in many cases, the code is simply not the bottleneck, and the simpler code (especially for grouping, etc) is worth a lot more than a few nanoseconds.
Note also that in .NET 4.0 there are additional Expression nodes for things like loops, commas, etc. The language doesn't support them, but the runtime does. I mention this only for completeness: I'm certainly not saying you should use manual Expression construction where foreach would do!
I'd say that the performance differences are usually so small (and in the case of loops, obviously, if you look at the results of the 2nd article (btw, Jon Skeet has a similar article here)) that you should almost never choose a solution for performance reasons alone, unless you are writing a piece of software where performance is absolutely the number one non-functional requirement and you really have to do micro-optimalizations.
When to choose what? I guess it depends on the situation but also the person. Just as an example, some people perfer List.Foreach over a normal foreach loop. I personally prefer the latter, as it is usually more readable, but who am I to argue against this?
Rules of thumb:
Write your code to be natural and readable.
Avoid code duplications (lambda expressions might require a little extra diligence).
Optimize only when there's a problem, and only with data to back up what that problem actually is.
Any time the lambda simply passes its arguments directly to another function. Don't create a lambda for function application.
Example:
var coll = new ObservableCollection<int>();
myInts.ForEach(x => coll.Add(x))
Is nicer as:
var coll = new ObservableCollection<int>();
myInts.ForEach(coll.Add)
The main exception is where C#'s type inference fails for whatever reason (and there are plenty of times that's true).
If you need recursion, don't use lambdas, or you'll end up getting very distracted!
Lambda expressions are cool. Over older delegate syntax they have a few advantages like, they can be converted to either anonymous function or expression trees, parameter types are inferred from the declaration, they are cleaner and more concise, etc. I see no real value to not use lambda expression when you're in need of an anonymous function. One not so big advantage the earlier style has is that you can omit the parameter declaration totally if they are not used. Like
Action<int> a = delegate { }; //takes one argument, but no argument specified
This is useful when you have to declare an empty delegate that does nothing, but it is not a strong reason enough to not use lambdas.
Lambdas let you write quick anonymous methods. Now that makes lambdas meaningless everywhere where anonymous methods go meaningless, ie where named methods make more sense. Over named methods, anonymous methods can be disadvantageous (not a lambda expression per se thing, but since these days lambdas widely represent anonymous methods it is relevant):
because it tend to lead to logic duplication (often does, reuse is difficult)
when it is unnecessary to write to one, like:
//this is unnecessary
Func<string, int> f = x => int.Parse(x);
//this is enough
Func<string, int> f = int.Parse;
since writing anonymous iterator block is impossible.
Func<IEnumerable<int>> f = () => { yield return 0; }; //impossible
since recursive lambdas require one more line of quirkiness, like
Func<int, int> f = null;
f = x => (x <= 1) ? 1 : x * f(x - 1);
well, since reflection is kinda messier, but that is moot isn't it?
Apart from point 3, the rest are not strong reasons not to use lambdas.
Also see this thread about what is disadvantageous about Func/Action delegates, since often they are used along with lambda expressions.

Categories