Why is lambda faster than IL injected dynamic method? - c#

I just built dynamic method - see below (thanks to the fellow SO users). It appears that the Func created as a dynamic method with IL injection 2x slower than the lambda.
Anyone knows why exactly?
(EDIT : this was built as Release x64 in VS2010. Please run it from console not from inside Visual Studio F5.)
class Program
{
static void Main(string[] args)
{
var mul1 = IL_EmbedConst(5);
var res = mul1(4);
Console.WriteLine(res);
var mul2 = EmbedConstFunc(5);
res = mul2(4);
Console.WriteLine(res);
double d, acc = 0;
Stopwatch sw = new Stopwatch();
for (int k = 0; k < 10; k++)
{
long time1;
sw.Restart();
for (int i = 0; i < 10000000; i++)
{
d = mul2(i);
acc += d;
}
sw.Stop();
time1 = sw.ElapsedMilliseconds;
sw.Restart();
for (int i = 0; i < 10000000; i++)
{
d = mul1(i);
acc += d;
}
sw.Stop();
Console.WriteLine("{0,6} {1,6}", time1, sw.ElapsedMilliseconds);
}
Console.WriteLine("\n{0}...\n", acc);
Console.ReadLine();
}
static Func<int, int> IL_EmbedConst(int b)
{
var method = new DynamicMethod("EmbedConst", typeof(int), new[] { typeof(int) } );
var il = method.GetILGenerator();
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldc_I4, b);
il.Emit(OpCodes.Mul);
il.Emit(OpCodes.Ret);
return (Func<int, int>)method.CreateDelegate(typeof(Func<int, int>));
}
static Func<int, int> EmbedConstFunc(int b)
{
return a => a * b;
}
}
Here is the output (for i7 920)
20
20
25 51
25 51
24 51
24 51
24 51
25 51
25 51
25 51
24 51
24 51
4.9999995E+15...
============================================================================
EDIT EDIT EDIT EDIT
Here is the proof of that dhtorpe was right - more complex lambda will lose its advantage.
Code to prove it (this demonstrate that Lambda has exactly the same performance with IL injection):
class Program
{
static void Main(string[] args)
{
var mul1 = IL_EmbedConst(5);
double res = mul1(4,6);
Console.WriteLine(res);
var mul2 = EmbedConstFunc(5);
res = mul2(4,6);
Console.WriteLine(res);
double d, acc = 0;
Stopwatch sw = new Stopwatch();
for (int k = 0; k < 10; k++)
{
long time1;
sw.Restart();
for (int i = 0; i < 10000000; i++)
{
d = mul2(i, i+1);
acc += d;
}
sw.Stop();
time1 = sw.ElapsedMilliseconds;
sw.Restart();
for (int i = 0; i < 10000000; i++)
{
d = mul1(i, i + 1);
acc += d;
}
sw.Stop();
Console.WriteLine("{0,6} {1,6}", time1, sw.ElapsedMilliseconds);
}
Console.WriteLine("\n{0}...\n", acc);
Console.ReadLine();
}
static Func<int, int, double> IL_EmbedConst(int b)
{
var method = new DynamicMethod("EmbedConstIL", typeof(double), new[] { typeof(int), typeof(int) });
var log = typeof(Math).GetMethod("Log", new Type[] { typeof(double) });
var il = method.GetILGenerator();
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldc_I4, b);
il.Emit(OpCodes.Mul);
il.Emit(OpCodes.Conv_R8);
il.Emit(OpCodes.Ldarg_1);
il.Emit(OpCodes.Ldc_I4, b);
il.Emit(OpCodes.Mul);
il.Emit(OpCodes.Conv_R8);
il.Emit(OpCodes.Call, log);
il.Emit(OpCodes.Sub);
il.Emit(OpCodes.Ret);
return (Func<int, int, double>)method.CreateDelegate(typeof(Func<int, int, double>));
}
static Func<int, int, double> EmbedConstFunc(int b)
{
return (a, z) => a * b - Math.Log(z * b);
}
}

The constant 5 was the cause. Why on earth could that be? Reason: When the JIT knows the constant is 5 it does not emit an imul instruction but a lea [rax, rax * 4]. This is a well-known assembly-level optimization. But for some reason, this code executed slower. The optimization was a pessimization.
And the C# compiler emitting a closure prevented the JIT from optimizing the code in that particular way.
Proof: Change the constant to 56878567 and the performance changes. When inspecting the JITed code you can see that an imul is used now.
I managed to catch this by hardcoding the constant 5 into the lambda like this:
static Func<int, int> EmbedConstFunc2(int b)
{
return a => a * 5;
}
This allowed me to inspect the JITed x86.
Sidenote: The .NET JIT does not inline delegate calls in any way. Just mentioning this because it was falsely speculated this was the case in the comments.
Sidenode 2: In order to receive the full JIT optimization level you need to compile in Release mode and start without debugger attached. The debugger prevents optimizations from being performed, even in Release mode.
Sidenote 3: Although EmbedConstFunc contains a closure and normally would be slower than the dynamically generated method the effect of this "lea"-optimization does more damage and eventually is slower.

lambda is not faster than DynamicMethod. It is based on. However, static method is faster than instance method but delegate create for static method is slower than delegate create for instance method. Lambda expression build a static method but use it like instance method by adding as first paameter a "Closure". Delegate to static method "pop" stack to get rid of non needed "this" instance before "mov" to real "IL body". in case of delegate for instance method "IL body" is directly hit. This is why a delegate to an hypotetic static method build by lambda expression is a faster (maybe a side effect of delegate pattern code sharing beetween instance/static method)
The performance issue can be avoid by adding an unused first argument (Closure type for example) to DynamicMethod and call CreateDelegate with explicit target instance (null can be used).
var myDelegate = DynamicMethod.CreateDelegate(MyDelegateType, null) as MyDelegateType;
http://msdn.microsoft.com/fr-fr/library/z43fsh67(v=vs.110).aspx
Tony THONG

Given that the performance difference exists only when running in release mode without a debugger attached, the only explanation I can think of is that the JIT compiler is able to make native code optimizations for the lambda expression that it is not able to perform for the emitted IL dynamic function.
Compiling for release mode (optimizations on) and running without the debugger attached, the lambda is consistently 2x faster than the generated IL dynamic method.
Running the same release-mode optimized build with a debugger attached to the process drops the lambda performance to comparable or worse than the generated IL dynamic method.
The only difference between these two runs is in the behavior of the JIT. When a process is being debugged, the JIT compiler suppresses a number of native code gen optimizations to preserve native instruction to IL instruction to source code line number mappings and other correlations that would be trashed by aggressive native instruction optimizations.
A compiler can only apply special case optimizations when the input expression graph (in this case, IL code) matches certain very specific patterns and conditions. The JIT compiler clearly has special knowledge of the lambda expression IL code pattern and is emitting different code for lambdas than for "normal" IL code.
It is quite possible that your IL instructions do not exactly match the pattern that causes the JIT compiler to optimize the lambda expression. For example, your IL instructions encode the B value as an inline constant, whereas the analogous lambda expression loads a field from an internal captured variable object instance. Even if your generated IL were to mimic the captured field pattern of the C# compiler generated lambda expression IL, it still might not be "close enough" to receive the same JIT treatment as the lambda expression.
As mentioned in the comments, this may well be due to inlining of the lambda to eliminate the call/return overhead. If this is the case, I would expect to see this difference in performance disappear in more complex lambda expressions, since inlining is usually reserved for only the simplest of expressions.

Related

lifetime of local variable inside an Action in c# [duplicate]

What is a closure? Do we have them in .NET?
If they do exist in .NET, could you please provide a code snippet (preferably in C#) explaining it?
I have an article on this very topic. (It has lots of examples.)
In essence, a closure is a block of code which can be executed at a later time, but which maintains the environment in which it was first created - i.e. it can still use the local variables etc of the method which created it, even after that method has finished executing.
The general feature of closures is implemented in C# by anonymous methods and lambda expressions.
Here's an example using an anonymous method:
using System;
class Test
{
static void Main()
{
Action action = CreateAction();
action();
action();
}
static Action CreateAction()
{
int counter = 0;
return delegate
{
// Yes, it could be done in one statement;
// but it is clearer like this.
counter++;
Console.WriteLine("counter={0}", counter);
};
}
}
Output:
counter=1
counter=2
Here we can see that the action returned by CreateAction still has access to the counter variable, and can indeed increment it, even though CreateAction itself has finished.
If you are interested in seeing how C# implements Closure read "I know the answer (its 42) blog"
The compiler generates a class in the background to encapsulate the anoymous method and the variable j
[CompilerGenerated]
private sealed class <>c__DisplayClass2
{
public <>c__DisplayClass2();
public void <fillFunc>b__0()
{
Console.Write("{0} ", this.j);
}
public int j;
}
for the function:
static void fillFunc(int count) {
for (int i = 0; i < count; i++)
{
int j = i;
funcArr[i] = delegate()
{
Console.Write("{0} ", j);
};
}
}
Turning it into:
private static void fillFunc(int count)
{
for (int i = 0; i < count; i++)
{
Program.<>c__DisplayClass1 class1 = new Program.<>c__DisplayClass1();
class1.j = i;
Program.funcArr[i] = new Func(class1.<fillFunc>b__0);
}
}
Closures are functional values that hold onto variable values from their original scope. C# can use them in the form of anonymous delegates.
For a very simple example, take this C# code:
delegate int testDel();
static void Main(string[] args)
{
int foo = 4;
testDel myClosure = delegate()
{
return foo;
};
int bar = myClosure();
}
At the end of it, bar will be set to 4, and the myClosure delegate can be passed around to be used elsewhere in the program.
Closures can be used for a lot of useful things, like delayed execution or to simplify interfaces - LINQ is mainly built using closures. The most immediate way it comes in handy for most developers is adding event handlers to dynamically created controls - you can use closures to add behavior when the control is instantiated, rather than storing data elsewhere.
Func<int, int> GetMultiplier(int a)
{
return delegate(int b) { return a * b; } ;
}
//...
var fn2 = GetMultiplier(2);
var fn3 = GetMultiplier(3);
Console.WriteLine(fn2(2)); //outputs 4
Console.WriteLine(fn2(3)); //outputs 6
Console.WriteLine(fn3(2)); //outputs 6
Console.WriteLine(fn3(3)); //outputs 9
A closure is an anonymous function passed outside of the function in which it is created.
It maintains any variables from the function in which it is created that it uses.
A closure is when a function is defined inside another function (or method) and it uses the variables from the parent method. This use of variables which are located in a method and wrapped in a function defined within it, is called a closure.
Mark Seemann has some interesting examples of closures in his blog post where he does a parallel between oop and functional programming.
And to make it more detailed
var workingDirectory = new DirectoryInfo(Environment.CurrentDirectory);//when this variable
Func<int, string> read = id =>
{
var path = Path.Combine(workingDirectory.FullName, id + ".txt");//is used inside this function
return File.ReadAllText(path);
};//the entire process is called a closure.
Here is a contrived example for C# which I created from similar code in JavaScript:
public delegate T Iterator<T>() where T : class;
public Iterator<T> CreateIterator<T>(IList<T> x) where T : class
{
var i = 0;
return delegate { return (i < x.Count) ? x[i++] : null; };
}
So, here is some code that shows how to use the above code...
var iterator = CreateIterator(new string[3] { "Foo", "Bar", "Baz"});
// So, although CreateIterator() has been called and returned, the variable
// "i" within CreateIterator() will live on because of a closure created
// within that method, so that every time the anonymous delegate returned
// from it is called (by calling iterator()) it's value will increment.
string currentString;
currentString = iterator(); // currentString is now "Foo"
currentString = iterator(); // currentString is now "Bar"
currentString = iterator(); // currentString is now "Baz"
currentString = iterator(); // currentString is now null
Hope that is somewhat helpful.
Closures are chunks of code that reference a variable outside themselves, (from below them on the stack), that might be called or executed later, (like when an event or delegate is defined, and could get called at some indefinite future point in time)... Because the outside variable that the chunk of code references may gone out of scope (and would otherwise have been lost), the fact that it is referenced by the chunk of code (called a closure) tells the runtime to "hold" that variable in scope until it is no longer needed by the closure chunk of code...
Basically closure is a block of code that you can pass as an argument to a function. C# supports closures in form of anonymous delegates.
Here is a simple example:
List.Find method can accept and execute piece of code (closure) to find list's item.
// Passing a block of code as a function argument
List<int> ints = new List<int> {1, 2, 3};
ints.Find(delegate(int value) { return value == 1; });
Using C#3.0 syntax we can write this as:
ints.Find(value => value == 1);
If you write an inline anonymous method (C#2) or (preferably) a Lambda expression (C#3+), an actual method is still being created. If that code is using an outer-scope local variable - you still need to pass that variable to the method somehow.
e.g. take this Linq Where clause (which is a simple extension method which passes a lambda expression):
var i = 0;
var items = new List<string>
{
"Hello","World"
};
var filtered = items.Where(x =>
// this is a predicate, i.e. a Func<T, bool> written as a lambda expression
// which is still a method actually being created for you in compile time
{
i++;
return true;
});
if you want to use i in that lambda expression, you have to pass it to that created method.
So the first question that arises is: should it be passed by value or reference?
Pass by reference is (I guess) more preferable as you get read/write access to that variable (and this is what C# does; I guess the team in Microsoft weighed the pros and cons and went with by-reference; According to Jon Skeet's article, Java went with by-value).
But then another question arises: Where to allocate that i?
Should it actually/naturally be allocated on the stack?
Well, if you allocate it on the stack and pass it by reference, there can be situations where it outlives it's own stack frame. Take this example:
static void Main(string[] args)
{
Outlive();
var list = whereItems.ToList();
Console.ReadLine();
}
static IEnumerable<string> whereItems;
static void Outlive()
{
var i = 0;
var items = new List<string>
{
"Hello","World"
};
whereItems = items.Where(x =>
{
i++;
Console.WriteLine(i);
return true;
});
}
The lambda expression (in the Where clause) again creates a method which refers to an i. If i is allocated on the stack of Outlive, then by the time you enumerate the whereItems, the i used in the generated method will point to the i of Outlive, i.e. to a place in the stack that is no longer accessible.
Ok, so we need it on the heap then.
So what the C# compiler does to support this inline anonymous/lambda, is use what is called "Closures": It creates a class on the Heap called (rather poorly) DisplayClass which has a field containing the i, and the Function that actually uses it.
Something that would be equivalent to this (you can see the IL generated using ILSpy or ILDASM):
class <>c_DisplayClass1
{
public int i;
public bool <GetFunc>b__0()
{
this.i++;
Console.WriteLine(i);
return true;
}
}
It instantiates that class in your local scope, and replaces any code relating to i or the lambda expression with that closure instance. So - anytime you are using the i in your "local scope" code where i was defined, you are actually using that DisplayClass instance field.
So if I would change the "local" i in the main method, it will actually change _DisplayClass.i ;
i.e.
var i = 0;
var items = new List<string>
{
"Hello","World"
};
var filtered = items.Where(x =>
{
i++;
return true;
});
filtered.ToList(); // will enumerate filtered, i = 2
i = 10; // i will be overwriten with 10
filtered.ToList(); // will enumerate filtered again, i = 12
Console.WriteLine(i); // should print out 12
it will print out 12, as "i = 10" goes to that dispalyclass field and changes it just before the 2nd enumeration.
A good source on the topic is this Bart De Smet Pluralsight module (requires registration) (also ignore his erroneous use of the term "Hoisting" - what (I think) he means is that the local variable (i.e. i) is changed to refer to the the new DisplayClass field).
In other news, there seems to be some misconception that "Closures" are related to loops - as I understand "Closures" are NOT a concept related to loops, but rather to anonymous methods / lambda expressions use of local scoped variables - although some trick questions use loops to demonstrate it.
A closure aims to simplify functional thinking, and it allows the runtime to manage
state, releasing extra complexity for the developer. A closure is a first-class function
with free variables that are bound in the lexical environment. Behind these buzzwords
hides a simple concept: closures are a more convenient way to give functions access
to local state and to pass data into background operations. They are special functions
that carry an implicit binding to all the nonlocal variables (also called free variables or
up-values) referenced. Moreover, a closure allows a function to access one or more nonlocal variables even when invoked outside its immediate lexical scope, and the body
of this special function can transport these free variables as a single entity, defined in
its enclosing scope. More importantly, a closure encapsulates behavior and passes it
around like any other object, granting access to the context in which the closure was
created, reading, and updating these values.
Just out of the blue,a simple and more understanding answer from the book C# 7.0 nutshell.
Pre-requisit you should know :A lambda expression can reference the local variables and parameters of the method
in which it’s defined (outer variables).
static void Main()
{
int factor = 2;
//Here factor is the variable that takes part in lambda expression.
Func<int, int> multiplier = n => n * factor;
Console.WriteLine (multiplier (3)); // 6
}
Real part:Outer variables referenced by a lambda expression are called captured variables. A lambda expression that captures variables is called a closure.
Last Point to be noted:Captured variables are evaluated when the delegate is actually invoked, not when the variables were captured:
int factor = 2;
Func<int, int> multiplier = n => n * factor;
factor = 10;
Console.WriteLine (multiplier (3)); // 30
A closure is a function, defined within a function, that can access the local variables of it as well as its parent.
public string GetByName(string name)
{
List<things> theThings = new List<things>();
return theThings.Find<things>(t => t.Name == name)[0];
}
so the function inside the find method.
t => t.Name == name
can access the variables inside its scope, t, and the variable name which is in its parents scope. Even though it is executed by the find method as a delegate, from another scope all together.

Field vs Property. Optimisation of performance

Please note this question related to performance only. Lets skip design guidelines, philosophy, compatibility, portability and anything what is not related to pure performance. Thank you.
Now to the question. I always assumed that because C# getters/setters are really methods in disguise then reading public field must be faster than calling a getter.
So to make sure I did a test (the code below). However this test only produces expected results (ie fields are faster than getters at 34%) if you run it from inside Visual Studio.
Once you run it from command line it shows pretty much the same timing...
The only explanation could be is that the CLR does additional optimisation (correct me if I am wrong here).
I do not believe that in real application where those properties being used in much more sophisticated way they will be optimised in the same way.
Please help me to prove or disprove the idea that in real life properties are slower than fields.
The question is - how should I modify the test classes to make the CLR change behaviour so the public field outperfroms the getters. OR show me that any property without internal logic will perform the same as a field (at least on the getter)
EDIT: I am only talking about Release x64 build.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Runtime.InteropServices;
namespace PropertyVsField
{
class Program
{
static int LEN = 20000000;
static void Main(string[] args)
{
List<A> a = new List<A>(LEN);
List<B> b = new List<B>(LEN);
Random r = new Random(DateTime.Now.Millisecond);
for (int i = 0; i < LEN; i++)
{
double p = r.NextDouble();
a.Add(new A() { P = p });
b.Add(new B() { P = p });
}
Stopwatch sw = new Stopwatch();
double d = 0.0;
sw.Restart();
for (int i = 0; i < LEN; i++)
{
d += a[i].P;
}
sw.Stop();
Console.WriteLine("auto getter. {0}. {1}.", sw.ElapsedTicks, d);
sw.Restart();
for (int i = 0; i < LEN; i++)
{
d += b[i].P;
}
sw.Stop();
Console.WriteLine(" field. {0}. {1}.", sw.ElapsedTicks, d);
Console.ReadLine();
}
}
class A
{
public double P { get; set; }
}
class B
{
public double P;
}
}
As others have already mentioned, the getters are inlined.
If you want to avoid inlining, you have to
replace the automatic properties with manual ones:
class A
{
private double p;
public double P
{
get { return p; }
set { p = value; }
}
}
and tell the compiler not to inline the getter (or both, if you feel like it):
[MethodImpl(MethodImplOptions.NoInlining)]
get { return p; }
Note that the first change does not make a difference in performance, whereas the second change shows a clear method call overhead:
Manual properties:
auto getter. 519005. 10000971,0237547.
field. 514235. 20001942,0475098.
No inlining of the getter:
auto getter. 785997. 10000476,0385552.
field. 531552. 20000952,077111.
Have a look at the Properties vs Fields – Why Does it Matter? (Jonathan Aneja) blog article from one of the VB team members on MSDN. He outlines the property versus fields argument and also explains trivial properties as follows:
One argument I’ve heard for using fields over properties is that
“fields are faster”, but for trivial properties that’s actually not
true, as the CLR’s Just-In-Time (JIT) compiler will inline the
property access and generate code that’s as efficient as accessing a
field directly.
The JIT will inline any method (not just a getter) that its internal metrics determine will be faster inlined. Given that a standard property is return _Property; it will be inlined in every case.
The reason you are seeing different behavior is that in Debug mode with a debugger attached, the JIT is significantly handicapped, to ensure that any stack locations match what you would expect from the code.
You are also forgetting the number one rule of performance, testing beats thinking. For instance even though quick sort is asymptotically faster than insertion sort, insertion sort is actually faster for extremely small inputs.
The only explanation could be is that the CLR does additional optimisation (correrct me if I am wrong here).
Yes, it is called inlining. It is done in the compiler (machine code level - i.e. JIT). As the getter/setter are trivial (i.e. very simple code) the method calls are destroyed and the getter/setter written in the surrounding code.
This does not happen in debug mode in order to support debugging (i.e. the ability to set a breakpoint in a getter or setter).
In visual studio there is no way to do that in the debugger. Compile release, run without attached debugger and you will get the full optimization.
I do not believe that in real application where those properties being used in much more sophisticated way they will be optimised in the same way.
The world is full of illusions that are wrong. They will be optimized as they are still trivial (i.e. simple code, so they are inlined).
It should be noted that it's possible to see the "real" performance in Visual Studio.
Compile in Release mode with Optimisations enabled.
Go to Debug -> Options and Settings, and uncheck "Suppress JIT optimization on module load (Managed only)".
Optionally, uncheck "Enable Just My Code" otherwise you may not be able to step in the code.
Now the jitted assembly will be the same even with the debugger attached, allowing you to step in the optimised dissassembly if you so please. This is essential to understand how the CLR optimises code.
After read all your articles, I decide to make a benchmark with these code:
[TestMethod]
public void TestFieldVsProperty()
{
const int COUNT = 0x7fffffff;
A a1 = new A();
A a2 = new A();
B b1 = new B();
B b2 = new B();
C c1 = new C();
C c2 = new C();
D d1 = new D();
D d2 = new D();
Stopwatch sw = new Stopwatch();
long t1, t2, t3, t4;
sw.Restart();
for (int i = COUNT - 1; i >= 0; i--)
{
a1.P = a2.P;
}
sw.Stop();
t1 = sw.ElapsedTicks;
sw.Restart();
for (int i = COUNT - 1; i >= 0; i--)
{
b1.P = b2.P;
}
sw.Stop();
t2 = sw.ElapsedTicks;
sw.Restart();
for (int i = COUNT - 1; i >= 0; i--)
{
c1.P = c2.P;
}
sw.Stop();
t3 = sw.ElapsedTicks;
sw.Restart();
for (int i = COUNT - 1; i >= 0; i--)
{
d1.P = d2.P;
}
sw.Stop();
t4 = sw.ElapsedTicks;
long max = Math.Max(Math.Max(t1, t2), Math.Max(t3, t4));
Console.WriteLine($"auto: {t1}, {max * 100d / t1:00.00}%.");
Console.WriteLine($"field: {t2}, {max * 100d / t2:00.00}%.");
Console.WriteLine($"manual: {t3}, {max * 100d / t3:00.00}%.");
Console.WriteLine($"no inlining: {t4}, {max * 100d / t4:00.00}%.");
}
class A
{
public double P { get; set; }
}
class B
{
public double P;
}
class C
{
private double p;
public double P
{
get => p;
set => p = value;
}
}
class D
{
public double P
{
[MethodImpl(MethodImplOptions.NoInlining)]
get;
[MethodImpl(MethodImplOptions.NoInlining)]
set;
}
}
When test in debug mode, I got this result:
auto: 35142496, 100.78%.
field: 10451823, 338.87%.
manual: 35183121, 100.67%.
no inlining: 35417844, 100.00%.
but when switch to release mode, the result is different than before.
auto: 2161291, 873.91%.
field: 2886444, 654.36%.
manual: 2252287, 838.60%.
no inlining: 18887768, 100.00%.
seems auto property is a better way.

Compiled C# lambda expression performance with imbrication

Considering this class:
/// <summary>
/// Dummy implementation of a parser for the purpose of the test
/// </summary>
class Parser
{
public List<T> ReadList<T>(Func<T> readFunctor)
{
return Enumerable.Range(0, 10).Select(i => readFunctor()).ToList();
}
public int ReadInt32()
{
return 12;
}
public string ReadString()
{
return "string";
}
}
I try to generate the following call with a compiled lambda expression tree:
Parser parser = new Parser();
List<int> list = parser.ReadList(parser.ReadInt32);
However, the peformance is not quite the same...
class Program
{
private const int MAX = 1000000;
static void Main(string[] args)
{
DirectCall();
LambdaCall();
CompiledLambdaCall();
}
static void DirectCall()
{
Parser parser = new Parser();
var sw = new Stopwatch();
sw.Start();
for (int i = 0; i < MAX; i++)
{
List<int> list = parser.ReadList(parser.ReadInt32);
}
sw.Stop();
Console.WriteLine("Direct call: {0} ms", sw.ElapsedMilliseconds);
}
static void LambdaCall()
{
Parser parser = new Parser();
var sw = new Stopwatch();
sw.Start();
for (int i = 0; i < MAX; i++)
{
List<int> list = parser.ReadList(() => parser.ReadInt32());
}
sw.Stop();
Console.WriteLine("Lambda call: {0} ms", sw.ElapsedMilliseconds);
}
static void CompiledLambdaCall()
{
var parserParameter = Expression.Parameter(typeof(Parser), "parser");
var lambda = Expression.Lambda<Func<Parser, List<int>>>(
Expression.Call(
parserParameter,
typeof(Parser).GetMethod("ReadList").MakeGenericMethod(typeof(int)),
Expression.Lambda(
typeof(Func<int>),
Expression.Call(
parserParameter,
typeof(Parser).GetMethod("ReadInt32")))),
parserParameter);
Func<Parser, List<int>> func = lambda.Compile();
Parser parser = new Parser();
var sw = new Stopwatch();
sw.Start();
for (int i = 0; i < MAX; i++)
{
List<int> list = func(parser);
}
sw.Stop();
Console.WriteLine("Compiled lambda call: {0} ms", sw.ElapsedMilliseconds);
}
}
These are the results in milliseconds on my computer :
Direct call: 647 ms
Lambda call: 641 ms
Compiled lambda call: 5861 ms
I don't understand why the compiled lambda call is so slow.
And I forgot to say that my test is run in release mode with the "Optimize Code" option enabled.
Update: Changed benchmarking based on DateTime.Now to Stopwatch.
Does anyone know how to tweak the lambda expression to obtain a better performance in the compiled lambda call ?
The test is invalid for two reasons:
DateTime.Now isn't accurate enough for micro-benchmarking short tests.
Use the Stopwatch class instead. When I do so, I get the following results (using MAX = 100000), in milliseconds:
Lambda call: 86.3196
Direct call: 74.057
Compiled lambda call: 814.2178
Indeed, the "direct call" is faster than the "lambda call", which makes sense - the "direct call" involves calls to a delegate that refers directly to a method on a Parser object. The "lambda call" requires a call to a delegate that refers to a method on a compiler-generated closure object, which in turn calls the method on the Parser object. This extra indirection introduces a minor speed-bump.
The "Compiled lambda call" isn't the same as the "Lambda call"
The "Lambda" looks like this:
() => parser.ReadInt32()
whereas the "Compiled lambda" looks like this:
parser => parser.ReadList(() => parser.ReadInt32())
There's an extra step in there: To create the embedded delegate for the inner lambda. In a tight loop, this is expensive.
EDIT:
I went ahead and inspected the IL of the "lambda" vs the "compiled lambda" and decompiled them back to "simpler" C# (see: Viewing the IL code generated from a compiled expression).
For the "non compiled" lambda, it looks like this:
for (int i = 0; i < 100000; i++)
{
if (CS$<>9__CachedAnonymousMethodDelegate1 == null)
{
CS$<>9__CachedAnonymousMethodDelegate1 = new Func<int>(CS$<>8__locals3.<LambdaCall>b__0);
}
CS$<>8__locals3.parser.ReadList<int>(CS$<>9__CachedAnonymousMethodDelegate1);
}
Note that a single delegate is created once and cached.
Whereas for the "compiled lambda", it looks like this:
Func<Parser, List<int>> func = lambda.Compile();
Parser parser = new Parser();
for (int i = 0; i < 100000; i++)
{
func(parser);
}
Where the target of the delegate is:
public static List<int> Foo(Parser parser)
{
object[] objArray = new object[] { new StrongBox<Parser>(parser) };
return ((StrongBox<Parser>) objArray[0]).Value.ReadList<int>
(new Func<int>(dyn_type.<ExpressionCompilerImplementationDetails>{1}lambda_method));
}
Note that although the "outer" delegate is created only once and cached, a new "inner" delegate is created on every iteration of the loop. Not to mention other allocations for the object array and the StrongBox<T> instance.
The primary reason the compiled lambda is slower is because the delegate is created over and over again. Anonymous delegates are a special breed: they are only used in one location. So the compiler can do some special optimizations, like caching the value the first time the delegate is called. This is what is happening here.
I was not able to reproduce the large difference between the direct call and the lambda call. In fact, in my measurements the direct call is slightly faster.
When doing benchmarks like this, you may want to use a more accurate timer. The Stopwatch class in System.Diagnostics is ideal. You may also want to increase your number of iterations. The code as is only runs for a few milliseconds.
Also, the first of the three cases will incur a slight overhead from JIT'ing the Parser class. Try running the first case twice and see what happens. Or better still: use the number of iterations as a parameter in each method, and call each method first for 1 iteration, so they all start on a level playing field.

Curious Performance difference between returning a value or returning through an Action<T> parameter

I was curious to see what the performance differences between returning a value from a method, or returning it through an Action parameter.
There is a somewhat related question to this
Performance of calling delegates vs methods
But for the life of me I can't explain why returning a value would be ~30% slower than calling a delegate to return the value. Is the .net Jitter (not compiler..) in-lining my simple delegate (I didn't think it did that)?
class Program
{
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
sw.Start();
A aa = new A();
long l = 0;
for( int i = 0; i < 100000000; i++ )
{
aa.DoSomething( i - 1, i, r => l += r );
}
sw.Stop();
Trace.WriteLine( sw.ElapsedMilliseconds + " : " + l );
sw.Reset();
sw.Start();
l = 0;
for( int i = 0; i < 100000000; i++ )
{
l += aa.DoSomething2( i - 1, i );
}
sw.Stop();
Trace.WriteLine( sw.ElapsedMilliseconds + " : " + l );
}
}
class A
{
private B bb = new B();
public void DoSomething( int a, int b, Action<long> result )
{
bb.Add( a,b, result );
}
public long DoSomething2( int a, int b )
{
return bb.Add2( a,b );
}
}
class B
{
public void Add( int a, int b, Action<long> result )
{
result( a + b );
}
public long Add2( int i, int i1 )
{
return i + i1;
}
}
I made a couple of changes to your code.
Moved new A() before the timed section.
Added warmup code before the timed section to get the methods JIT'ed.
Created an Action<long> reference before the timed section and loop so that it does not have to be created on each iteration. This one seemed to have a big impact on execution time.
Here are my results after making the above changes. The vshost column indicates whether the code was executing inside the vshost.exe process (by running directly from Visual Studio). I was using Visual Studio 2008 and targeted .NET 3.5 SP1.
vshost? Debug Release
-------------------------
YES 6405 3827
11059 3092
NO 4214 1691
4607 811
Notice how you get different results depending on the build configuration and the execution environment. The results are interesting if nothing else. If I get time I might edit my answer to provide a theory.
Strangely, I'm not seeing the behavior you're describing when running a Release build in VS. I am seeing it when running a Debug build. The only thing I can figure is that there's added overhead with the return-based approach when running the Debug build, though I'm not clever enough to see why.
Here's something else that's interesting: this discrepancy disappears when I switch to a x64 build (Release or Debug).
If I were to venture a guess (completely unsubstantiated), it might be that the cost of passing the 64-bit long as a return value in both B.Add2 and A.DoSomething2 outweighs that of passing the Action<long> in a 32-bit environment. In a 64-bit environment, this savings would vanish as the Action<long> would require 64 bits as well. In a Release build in either configuration, the cost of passing the long probably disappears as both B.Add2 and A.DoSomething2 seem like prime candidates for inlining.
Somebody who knows way more about this than I do: feel free to totally refute everything I just said. We're all here to learn, after all ;)
Well for starters your call to new A() is being timed the way you currently have your code set up. You need to make sure you're running in release mode with optimizations on as well. Also you need to take the JIT into account--prime all your code paths so you can guarantee they are compiled before you time them (unless you are concerned about start-up time).
I see an issue when you try to time a large quantity of primitive operations (the simple addition). In this case you can't make any definitive conclusions since any overhead will completely dominate your measurements.
edit:
In release mode targeting .NET 3.5 in VS2008 I get:
1719 : 9999999800000000
1337 : 9999999800000000
Which seems to be consistent with many of the other answers. Using ILDasm gives the following IL for B.Add:
IL_0000: ldarg.3
IL_0001: ldarg.1
IL_0002: ldarg.2
IL_0003: add
IL_0004: conv.i8
IL_0005: callvirt instance void class [mscorlib]System.Action`1<int64>::Invoke(!0)
IL_000a: ret
Where B.Add2 is:
IL_0000: ldarg.1
IL_0001: ldarg.2
IL_0002: add
IL_0003: conv.i8
IL_0004: ret
So it looks as though you're pretty much just timing a load and callvirt.
Why not use reflector to find out?

Predicates and lambda expression

I have recently moved to .net 3.0 (windows forms, C#). I want to know more about predicates and lambda expressions. Where should we use them? Do they improve performance? and how do they work internally. Thanks.
If you search Stack Overflow you'll find about a thousand answers explaining what they're for. In short - a lambda is a way of writing an anonymous method at the point where you want to pass it to another method. Technically the same as the delegate syntax for an anonymous method, although with added powers of type inference so you don't need to state the parameter types. A predicate is a method that accepts some value and returns a bool - an example would be the argument to Where.
A lambda that doesn't refer to any external variables gets turned into a private static method with a made-up name. If it refers to instance members of the enclosing class, it becomes an instance method. If it refers to local variables, those variables get "hoisted" into being fields of a compiler-generated class that is allocated when the enclosing method starts running, and the lambda's body becomes a method in that new class.
As for performance, they don't make that much difference. They involve the creation of temporary objects, but I find that these are collected extremely efficiently by the GC.
If you want to study the different versions of C# and how they different .My suggestion is read the book C.Sharp.in.Depth by jon skeet . This will give you the better understanding of new versions
Do they improve performance? and how
do they work internally. Thanks.
For the most part, you'll never notice the performance hit. However, there are some pathological cases which will kill performance, namely overzealous use of fixed point combinators.
Its a well-known trick that we can use the Y-combinator to write recursive lambda functions, however consider the following code:
using System;
using System.Diagnostics;
namespace YCombinator
{
class Program
{
static Func<T, U> y<T, U>(Func<Func<T, U>, Func<T, U>> f)
{
return f(x => y<T, U>(f)(x));
}
static int fibIter(int n)
{
int fib0 = 0, fib1 = 1;
for (int i = 1; i <= n; i++)
{
int tmp = fib0;
fib0 = fib1;
fib1 = tmp + fib1;
}
return fib0;
}
static Func<int, int> fibCombinator()
{
return y<int, int>(f => n =>
{
switch (n)
{
case 0: return 0;
case 1: return 1;
default: return f(n - 1) + f(n - 2);
}
});
}
static int fibRecursive(int n)
{
switch (n)
{
case 0: return 0;
case 1: return 1;
default: return fibRecursive(n - 1) + fibRecursive(n - 2);
}
}
static void Benchmark(string msg, int iterations, Func<int, int> f)
{
int[] testCases = new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20 };
Stopwatch watch = Stopwatch.StartNew();
for (int i = 0; i <= iterations; i++)
{
foreach (int n in testCases)
{
f(n);
}
}
watch.Stop();
Console.WriteLine("{0}: {1}", msg, watch.Elapsed.TotalMilliseconds);
}
static void Main(string[] args)
{
int iterations = 10000;
Benchmark("fibIter", iterations, fibIter);
Benchmark("fibCombinator", iterations, fibCombinator());
Benchmark("fibRecursive", iterations, fibRecursive);
Console.ReadKey(true);
}
}
}
This program prints out:
fibIter: 14.8074
fibCombinator: 61775.1485
fibRecursive: 2591.2444
fibCombinator and fibRecursive are functionally equivalent and have the same computational complexity, but fibCombinator is a full 4100x slower due to all of the intermediate object allocations.

Categories