Compiled C# lambda expression performance with imbrication

Compiled C# lambda expression performance with imbrication - c#

Considering this class:
/// <summary>
/// Dummy implementation of a parser for the purpose of the test
/// </summary>
class Parser
{
public List<T> ReadList<T>(Func<T> readFunctor)
{
return Enumerable.Range(0, 10).Select(i => readFunctor()).ToList();
}
public int ReadInt32()
{
return 12;
}
public string ReadString()
{
return "string";
}
}
I try to generate the following call with a compiled lambda expression tree:
Parser parser = new Parser();
List<int> list = parser.ReadList(parser.ReadInt32);
However, the peformance is not quite the same...
class Program
{
private const int MAX = 1000000;
static void Main(string[] args)
{
DirectCall();
LambdaCall();
CompiledLambdaCall();
}
static void DirectCall()
{
Parser parser = new Parser();
var sw = new Stopwatch();
sw.Start();
for (int i = 0; i < MAX; i++)
{
List<int> list = parser.ReadList(parser.ReadInt32);
}
sw.Stop();
Console.WriteLine("Direct call: {0} ms", sw.ElapsedMilliseconds);
}
static void LambdaCall()
{
Parser parser = new Parser();
var sw = new Stopwatch();
sw.Start();
for (int i = 0; i < MAX; i++)
{
List<int> list = parser.ReadList(() => parser.ReadInt32());
}
sw.Stop();
Console.WriteLine("Lambda call: {0} ms", sw.ElapsedMilliseconds);
}
static void CompiledLambdaCall()
{
var parserParameter = Expression.Parameter(typeof(Parser), "parser");
var lambda = Expression.Lambda<Func<Parser, List<int>>>(
Expression.Call(
parserParameter,
typeof(Parser).GetMethod("ReadList").MakeGenericMethod(typeof(int)),
Expression.Lambda(
typeof(Func<int>),
Expression.Call(
parserParameter,
typeof(Parser).GetMethod("ReadInt32")))),
parserParameter);
Func<Parser, List<int>> func = lambda.Compile();
Parser parser = new Parser();
var sw = new Stopwatch();
sw.Start();
for (int i = 0; i < MAX; i++)
{
List<int> list = func(parser);
}
sw.Stop();
Console.WriteLine("Compiled lambda call: {0} ms", sw.ElapsedMilliseconds);
}
}
These are the results in milliseconds on my computer :
Direct call: 647 ms
Lambda call: 641 ms
Compiled lambda call: 5861 ms
I don't understand why the compiled lambda call is so slow.
And I forgot to say that my test is run in release mode with the "Optimize Code" option enabled.
Update: Changed benchmarking based on DateTime.Now to Stopwatch.
Does anyone know how to tweak the lambda expression to obtain a better performance in the compiled lambda call ?

The test is invalid for two reasons:
DateTime.Now isn't accurate enough for micro-benchmarking short tests.
Use the Stopwatch class instead. When I do so, I get the following results (using MAX = 100000), in milliseconds:
Lambda call: 86.3196
Direct call: 74.057
Compiled lambda call: 814.2178
Indeed, the "direct call" is faster than the "lambda call", which makes sense - the "direct call" involves calls to a delegate that refers directly to a method on a Parser object. The "lambda call" requires a call to a delegate that refers to a method on a compiler-generated closure object, which in turn calls the method on the Parser object. This extra indirection introduces a minor speed-bump.
The "Compiled lambda call" isn't the same as the "Lambda call"
The "Lambda" looks like this:
() => parser.ReadInt32()
whereas the "Compiled lambda" looks like this:
parser => parser.ReadList(() => parser.ReadInt32())
There's an extra step in there: To create the embedded delegate for the inner lambda. In a tight loop, this is expensive.
EDIT:
I went ahead and inspected the IL of the "lambda" vs the "compiled lambda" and decompiled them back to "simpler" C# (see: Viewing the IL code generated from a compiled expression).
For the "non compiled" lambda, it looks like this:
for (int i = 0; i < 100000; i++)
{
if (CS$<>9__CachedAnonymousMethodDelegate1 == null)
{
CS$<>9__CachedAnonymousMethodDelegate1 = new Func<int>(CS$<>8__locals3.<LambdaCall>b__0);
}
CS$<>8__locals3.parser.ReadList<int>(CS$<>9__CachedAnonymousMethodDelegate1);
}
Note that a single delegate is created once and cached.
Whereas for the "compiled lambda", it looks like this:
Func<Parser, List<int>> func = lambda.Compile();
Parser parser = new Parser();
for (int i = 0; i < 100000; i++)
{
func(parser);
}
Where the target of the delegate is:
public static List<int> Foo(Parser parser)
{
object[] objArray = new object[] { new StrongBox<Parser>(parser) };
return ((StrongBox<Parser>) objArray[0]).Value.ReadList<int>
(new Func<int>(dyn_type.<ExpressionCompilerImplementationDetails>{1}lambda_method));
}
Note that although the "outer" delegate is created only once and cached, a new "inner" delegate is created on every iteration of the loop. Not to mention other allocations for the object array and the StrongBox<T> instance.

The primary reason the compiled lambda is slower is because the delegate is created over and over again. Anonymous delegates are a special breed: they are only used in one location. So the compiler can do some special optimizations, like caching the value the first time the delegate is called. This is what is happening here.
I was not able to reproduce the large difference between the direct call and the lambda call. In fact, in my measurements the direct call is slightly faster.
When doing benchmarks like this, you may want to use a more accurate timer. The Stopwatch class in System.Diagnostics is ideal. You may also want to increase your number of iterations. The code as is only runs for a few milliseconds.
Also, the first of the three cases will incur a slight overhead from JIT'ing the Parser class. Try running the first case twice and see what happens. Or better still: use the number of iterations as a parameter in each method, and call each method first for 1 iteration, so they all start on a level playing field.

Related

C# LINQ performance when extension method called inside where clause

I have a LINQ query like this
public static bool CheckIdExists(int searchId)
{
return itemCollection.Any(item => item.Id.Equals(searchId.ConvertToString()));
}
item.Id is a string while searchId is an int. .ConvertToString() is an extension which which converts int to string
Code for ConvertToString:
public static string ConvertToString(this object input)
{
return Convert.ToString(input, CultureInfo.InvariantCulture);
}
Now my query is, does searchId.ConvertToString() gets executed for each item in itemCollection?
Is computing searchId.ConvertToString() beforehand and calling the method like below improves performance?
public static bool CheckIdExists(int searchId)
{
string sId=searchId.ConvertToString();
return itemCollection.Any(item => item.Id.Equals(sId));
}
How to debug these two scenarios and observe their performances?

I re-generated the scenarios you talked about in your question. I tried following code and got this output.
But this is how you can debug this.
static List<string> itemCollection = new List<string>();
static void Main(string[] args)
{
for (int i = 0; i < 10000000; i++)
{
itemCollection.Add(i.ToString());
}
var watch = new Stopwatch();
watch.Start();
Console.WriteLine(CheckIdExists(580748));
watch.Stop();
Console.WriteLine($"Took {watch.ElapsedMilliseconds}");
var watch1 = new Stopwatch();
watch1.Start();
Console.WriteLine(CheckIdExists1(580748));
watch1.Stop();
Console.WriteLine($"Took {watch1.ElapsedMilliseconds}");
Console.ReadLine();
}
public static bool CheckIdExists(int searchId)
{
return itemCollection.Any(item => item.Equals(ConvertToString(searchId)));
}
public static bool CheckIdExists1(int searchId)
{
string sId =ConvertToString(searchId);
return itemCollection.Any(item => item.Equals(sId));
}
public static string ConvertToString(int input)
{
return Convert.ToString(input, CultureInfo.InvariantCulture);
}
OUTPUT:
True
Took 170
True
Took 11

How long it takes is the ultimate guide. You can create a stopwatch to log the performance of any code. Just use the ElapsedMilliseconds to see how long has been taken. For very short operations I suggest using very long loops to get a more accurate length of time.
var watch = new Stopwatch();
watch.Start();
/// CODE HERE (IDEALLY IN A LONG LOOP)
Debub.WriteLine($"Took {watch.ElapsedMilliseconds}");

Yes, it should be faster to get the string once. But I guess that compiler does optimize that thing for you (I just suspect this, don't ave anything to back it up. I just remeber that compilers are very good at detecting things that are not changing).
And no, it's not computed for every item, since LINQ method Any does not necessarily check all items. It return true for the first matching item. The only scenario when it checks all items, is where for none the lambda returns true.
If you want to test the speed difference,make sure to have more data - otherwise the difference may be too small.
Just do:
itemCollection = Enumerable.Range(0, 1000).SelectMany(x => itemCollection).ToList() // or array or whatever the type of collection you have
Than measure the times with StopWatch, just like #RobSedgwick said

I think you have two solution:
1- make log and meke inside this log datetime.now
2- you can use the diagnostic Tools tab
hopefully this help you

Why is there such a large difference in the performance of different ways to pass delegates?

I was attempting to compare three different ways of passing a delegate to a function in C# -- by lambda, by delegate, and by direct reference. What really surprised me was the direct reference method (i.e. ComputeStringFunctionViaFunc(object[i].ToString)) was six times slower than the other methods. Does anyone know why this is?
The complete code is as below:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Runtime.CompilerServices;
namespace FunctionInvocationTest
{
class Program
{
static void Main(string[] args)
{
object[] objectArray = new object[10000000];
for (int i = 0; i < objectArray.Length; ++i) { objectArray[i] = new object(); }
ComputeStringFunction(objectArray[0]);
ComputeStringFunctionViaFunc(objectArray[0].ToString);
ComputeStringFunctionViaFunc(delegate() { return objectArray[0].ToString(); });
ComputeStringFunctionViaFunc(() => objectArray[0].ToString());
System.Diagnostics.Stopwatch s = new System.Diagnostics.Stopwatch();
s.Start();
for (int i = 0; i < objectArray.Length; ++i)
{
ComputeStringFunction(objectArray[i]);
}
s.Stop();
Console.WriteLine(s.Elapsed.TotalMilliseconds);
s.Reset();
s.Start();
for (int i = 0; i < objectArray.Length; ++i)
{
ComputeStringFunctionViaFunc(delegate() { return objectArray[i].ToString(); });
}
s.Stop();
Console.WriteLine(s.Elapsed.TotalMilliseconds);
s.Reset();
s.Start();
for (int i = 0; i < objectArray.Length; ++i)
{
ComputeStringFunctionViaFunc(objectArray[i].ToString);
}
s.Stop();
Console.WriteLine(s.Elapsed.TotalMilliseconds);
s.Reset();
s.Start();
for (int i = 0; i < objectArray.Length; ++i)
{
ComputeStringFunctionViaFunc(() => objectArray[i].ToString());
}
s.Stop();
Console.WriteLine(s.Elapsed.TotalMilliseconds);
Console.ReadLine();
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static void ComputeStringFunction(object stringFunction)
{
}
public static void ComputeStringFunctionViaFunc(Func<string> stringFunction)
{
}
}
}

After fixing up your code to actually call ToString() / stringFunction(), and measuring using Mono 2.10.9:
ComputeStringFunctionViaFunc(objectArray[i].ToString); is slow because object.ToString is virtual. Each object is checked in case it overrides ToString and the overridden ToString should be called. Your other delegates are created to refer to a non-virtual function (fast), which directly calls a virtual function (also fast). The fact that this is the cause can be seen when modifying the generated IL to change
ldelem.ref
dup
ldvirtftn instance string object::ToString()
to
ldelem.ref
ldftn instance string object::ToString()
which always refers to object.ToString, never an overriding function. The three methods then all take about the same time.
Update: one additional method, to bind directly to objectArray[i] but still call ToString virtually:
for (int i = 0; i < objectArray.Length; ++i)
{
ComputeStringFunctionViaFunc(objectArray[i].ToStringHelper);
}
static class Extensions
{
public static string ToStringHelper(this object obj)
{
return obj.ToString();
}
}
also gives roughly the same timings as the other non-virtual delegates.

Lets examine what you are doing in each case:
This guy doesnt "create" a function at all. It looks up an item (in this case an object) in an array, and passes the item as the parameter to a function:
// The cost of doing the array lookup happens right here, before
// ComputeStringFunction is called
ComputeStringFunction(objectArray[i]);
This one creates a parameterless delegate and passes it to a function. The delegate itself is never called:
// Because ComputeStringFunctionViaFunc doesn't do anything, the
// statement objectArray[i] is never evaluated, so the only cost
// is that of creating a delegate
ComputeStringFunctionViaFunc(delegate() { return objectArray[i].ToString(); });
This one does the same as the first, except instead of passing the item immedately after retrieving it from the array, it calls .ToString() on it. Again, no function is created here:
Like the first, this one has the cost of the array lookup up front, but then creates a delegate referencing the .ToString method of the item (thanks #hvd for catching that). Like the others, .ToString is never evaluated. The cost is (again, thanks #hvd) that of looking up the virtual method.
// The cost of doing the array lookup happens right here
ComputeStringFunctionViaFunc(objectArray[i].ToString);
Finally, this one creates a function using a lambda and a closure over an array item, and passes that lambda to a function. Depending on the functions signature, the lambda may be compiled or not:
// Again, we create a delegate but don't call it, so the array
// lookup and .ToString are never evaluated.
ComputeStringFunctionViaFunc(() => objectArray[i].ToString());
The important thing to note here is that evaluation of the array lookup is delayed in the second and fourth, while it is not delayed in the first and third.
These tests are somewhat nonsensical because they all do completely different things. There are almost certainly better ways of timing delegate creation.

Why is lambda faster than IL injected dynamic method?

I just built dynamic method - see below (thanks to the fellow SO users). It appears that the Func created as a dynamic method with IL injection 2x slower than the lambda.
Anyone knows why exactly?
(EDIT : this was built as Release x64 in VS2010. Please run it from console not from inside Visual Studio F5.)
class Program
{
static void Main(string[] args)
{
var mul1 = IL_EmbedConst(5);
var res = mul1(4);
Console.WriteLine(res);
var mul2 = EmbedConstFunc(5);
res = mul2(4);
Console.WriteLine(res);
double d, acc = 0;
Stopwatch sw = new Stopwatch();
for (int k = 0; k < 10; k++)
{
long time1;
sw.Restart();
for (int i = 0; i < 10000000; i++)
{
d = mul2(i);
acc += d;
}
sw.Stop();
time1 = sw.ElapsedMilliseconds;
sw.Restart();
for (int i = 0; i < 10000000; i++)
{
d = mul1(i);
acc += d;
}
sw.Stop();
Console.WriteLine("{0,6} {1,6}", time1, sw.ElapsedMilliseconds);
}
Console.WriteLine("\n{0}...\n", acc);
Console.ReadLine();
}
static Func<int, int> IL_EmbedConst(int b)
{
var method = new DynamicMethod("EmbedConst", typeof(int), new[] { typeof(int) } );
var il = method.GetILGenerator();
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldc_I4, b);
il.Emit(OpCodes.Mul);
il.Emit(OpCodes.Ret);
return (Func<int, int>)method.CreateDelegate(typeof(Func<int, int>));
}
static Func<int, int> EmbedConstFunc(int b)
{
return a => a * b;
}
}
Here is the output (for i7 920)
20
20
25 51
25 51
24 51
24 51
24 51
25 51
25 51
25 51
24 51
24 51
4.9999995E+15...
============================================================================
EDIT EDIT EDIT EDIT
Here is the proof of that dhtorpe was right - more complex lambda will lose its advantage.
Code to prove it (this demonstrate that Lambda has exactly the same performance with IL injection):
class Program
{
static void Main(string[] args)
{
var mul1 = IL_EmbedConst(5);
double res = mul1(4,6);
Console.WriteLine(res);
var mul2 = EmbedConstFunc(5);
res = mul2(4,6);
Console.WriteLine(res);
double d, acc = 0;
Stopwatch sw = new Stopwatch();
for (int k = 0; k < 10; k++)
{
long time1;
sw.Restart();
for (int i = 0; i < 10000000; i++)
{
d = mul2(i, i+1);
acc += d;
}
sw.Stop();
time1 = sw.ElapsedMilliseconds;
sw.Restart();
for (int i = 0; i < 10000000; i++)
{
d = mul1(i, i + 1);
acc += d;
}
sw.Stop();
Console.WriteLine("{0,6} {1,6}", time1, sw.ElapsedMilliseconds);
}
Console.WriteLine("\n{0}...\n", acc);
Console.ReadLine();
}
static Func<int, int, double> IL_EmbedConst(int b)
{
var method = new DynamicMethod("EmbedConstIL", typeof(double), new[] { typeof(int), typeof(int) });
var log = typeof(Math).GetMethod("Log", new Type[] { typeof(double) });
var il = method.GetILGenerator();
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldc_I4, b);
il.Emit(OpCodes.Mul);
il.Emit(OpCodes.Conv_R8);
il.Emit(OpCodes.Ldarg_1);
il.Emit(OpCodes.Ldc_I4, b);
il.Emit(OpCodes.Mul);
il.Emit(OpCodes.Conv_R8);
il.Emit(OpCodes.Call, log);
il.Emit(OpCodes.Sub);
il.Emit(OpCodes.Ret);
return (Func<int, int, double>)method.CreateDelegate(typeof(Func<int, int, double>));
}
static Func<int, int, double> EmbedConstFunc(int b)
{
return (a, z) => a * b - Math.Log(z * b);
}
}

The constant 5 was the cause. Why on earth could that be? Reason: When the JIT knows the constant is 5 it does not emit an imul instruction but a lea [rax, rax * 4]. This is a well-known assembly-level optimization. But for some reason, this code executed slower. The optimization was a pessimization.
And the C# compiler emitting a closure prevented the JIT from optimizing the code in that particular way.
Proof: Change the constant to 56878567 and the performance changes. When inspecting the JITed code you can see that an imul is used now.
I managed to catch this by hardcoding the constant 5 into the lambda like this:
static Func<int, int> EmbedConstFunc2(int b)
{
return a => a * 5;
}
This allowed me to inspect the JITed x86.
Sidenote: The .NET JIT does not inline delegate calls in any way. Just mentioning this because it was falsely speculated this was the case in the comments.
Sidenode 2: In order to receive the full JIT optimization level you need to compile in Release mode and start without debugger attached. The debugger prevents optimizations from being performed, even in Release mode.
Sidenote 3: Although EmbedConstFunc contains a closure and normally would be slower than the dynamically generated method the effect of this "lea"-optimization does more damage and eventually is slower.

lambda is not faster than DynamicMethod. It is based on. However, static method is faster than instance method but delegate create for static method is slower than delegate create for instance method. Lambda expression build a static method but use it like instance method by adding as first paameter a "Closure". Delegate to static method "pop" stack to get rid of non needed "this" instance before "mov" to real "IL body". in case of delegate for instance method "IL body" is directly hit. This is why a delegate to an hypotetic static method build by lambda expression is a faster (maybe a side effect of delegate pattern code sharing beetween instance/static method)
The performance issue can be avoid by adding an unused first argument (Closure type for example) to DynamicMethod and call CreateDelegate with explicit target instance (null can be used).
var myDelegate = DynamicMethod.CreateDelegate(MyDelegateType, null) as MyDelegateType;
http://msdn.microsoft.com/fr-fr/library/z43fsh67(v=vs.110).aspx
Tony THONG

Given that the performance difference exists only when running in release mode without a debugger attached, the only explanation I can think of is that the JIT compiler is able to make native code optimizations for the lambda expression that it is not able to perform for the emitted IL dynamic function.
Compiling for release mode (optimizations on) and running without the debugger attached, the lambda is consistently 2x faster than the generated IL dynamic method.
Running the same release-mode optimized build with a debugger attached to the process drops the lambda performance to comparable or worse than the generated IL dynamic method.
The only difference between these two runs is in the behavior of the JIT. When a process is being debugged, the JIT compiler suppresses a number of native code gen optimizations to preserve native instruction to IL instruction to source code line number mappings and other correlations that would be trashed by aggressive native instruction optimizations.
A compiler can only apply special case optimizations when the input expression graph (in this case, IL code) matches certain very specific patterns and conditions. The JIT compiler clearly has special knowledge of the lambda expression IL code pattern and is emitting different code for lambdas than for "normal" IL code.
It is quite possible that your IL instructions do not exactly match the pattern that causes the JIT compiler to optimize the lambda expression. For example, your IL instructions encode the B value as an inline constant, whereas the analogous lambda expression loads a field from an internal captured variable object instance. Even if your generated IL were to mimic the captured field pattern of the C# compiler generated lambda expression IL, it still might not be "close enough" to receive the same JIT treatment as the lambda expression.
As mentioned in the comments, this may well be due to inlining of the lambda to eliminate the call/return overhead. If this is the case, I would expect to see this difference in performance disappear in more complex lambda expressions, since inlining is usually reserved for only the simplest of expressions.

Expression evaluator for C#/Python/Ruby

We have semi-complicated expressions in the format:
"25 + [Variable1] > [Variable2]"
We need an expression evaluator to parse the expression and use a callback to ask for the variable values and work out the overall result of the expression. It has to be a callback as there are thousands of variables.
We need the usual math operators but also things like "if" etc. The richer the language the better.
We can use any language we want. Anyone have any suggestions?

Have you considered using Mono.CSharp.Evaluator? It seems like this in conjunction with an appropriatelly set InteractiveBaseClass would do the trick quite nicely, and with minimal effort.
Note that the following uses Mono 2.11.1 alpha.
using System;
using System.Diagnostics;
using Mono.CSharp;
using NUnit.Framework;
public class MonoExpressionEvaluator
{
[Test]
public void ProofOfConcept()
{
Evaluator evaluator = new Evaluator(new CompilerContext(new CompilerSettings(), new ConsoleReportPrinter()));
evaluator.InteractiveBaseClass = typeof (Variables);
Variables.Variable1Callback = () => 5.1;
Variables.Variable2Callback = () => 30;
var result = evaluator.Evaluate("25 + Variable1 > Variable2");
Assert.AreEqual(25 + Variables.Variable1 > Variables.Variable2, result);
Console.WriteLine(result);
}
public class Variables
{
internal static Func<double> Variable1Callback;
public static Double Variable1 { get { return Variable1Callback(); } }
internal static Func<double> Variable2Callback;
public static Double Variable2 { get { return Variable2Callback(); } }
}
}
Real shame it runs a little slow. For instance, on my i7-m620 it takes almost 8 seconds to run this 10,000 times:
[Test]
public void BenchmarkEvaluate()
{
Evaluator evaluator = new Evaluator(new CompilerContext(new CompilerSettings(), new ConsoleReportPrinter()));
evaluator.InteractiveBaseClass = typeof(Variables);
Variables.Variable1Callback = () => 5.1;
Variables.Variable2Callback = () => 30;
var sw = Stopwatch.StartNew();
for (int i = 1; i < 10000; i++)
evaluator.Evaluate("25 + Variable1 > Variable2");
sw.Stop();
Console.WriteLine(sw.Elapsed);
}
00:00:07.6035024
It'd be great if we could parse and compile it to IL so we could execute it at .NET speeds, but that sounds like a bit of a pipe dream...
[Test]
public void BenchmarkCompiledMethod()
{
Evaluator evaluator = new Evaluator(new CompilerContext(new CompilerSettings(), new ConsoleReportPrinter()));
evaluator.InteractiveBaseClass = typeof(Variables);
Variables.Variable1Callback = () => 5.1;
Variables.Variable2Callback = () => 30;
var method = evaluator.Compile("25 + Variable1 > Variable2");
object result = null;
method(ref result);
Assert.AreEqual(25 + Variables.Variable1 > Variables.Variable2, result);
Variables.Variable2Callback = () => 31;
method(ref result);
Assert.AreEqual(25 + Variables.Variable1 > Variables.Variable2, result);
var sw = Stopwatch.StartNew();
for (int i = 1; i < 10000; i++)
method(ref result);
sw.Stop();
Console.WriteLine(sw.Elapsed);
}
00:00:00.0003799
Oh my.
Need excel-like expression constructs like IF? Build your own!
[Test]
public void ProofOfConcept2()
{
Evaluator evaluator = new Evaluator(new CompilerContext(new CompilerSettings(), new ConsoleReportPrinter()));
evaluator.InteractiveBaseClass = typeof(Variables2);
Variables.Variable1Callback = () => 5.1;
Variables.Variable2Callback = () => 30;
var result = evaluator.Evaluate(#"IF(25 + Variable1 > Variable2, ""TRUE"", ""FALSE"")");
Assert.AreEqual("TRUE", result);
Console.WriteLine(result);
}
public class Variables2 : Variables
{
public static T IF<T>(bool expr, T trueValue, T falseValue)
{
return expr ? trueValue : falseValue;
}
}

Check out NCalc. It's .NET and should support your requirements.

Pure expression evaluators are actually pretty easy to write.
See this SO answer which shows expression evaluators in a dozen langauges. You should be able to adapt one of these:
Code Golf: Mathematical expression evaluator (that respects PEMDAS)
EDIT: Whoever dinged this obviously didn't go and examine the solutions there. Yes, there are a bunch that are crammed tight to meet the golf-rules (typically "smallest") but most of them are explained pretty clearly with a cleartext version of algorithm.

Well ... you need a language. You have C#, VB.Net, IronPython, IronRuby, and others.
Simple replace the open variables using regex (maybe you even know them ahead and just need a string.Replace) and then compile the script using CodeDOM (for C# or VB.Net) or use the DLR (IronPython, IronRuby). You can simply add the variables as method parameters in the method wrapper you use to encapsulate your code (for CodeDOM) or just inject the variables in the DLR.
Both variants we implemented in our team in business with less effort and reliable effort.
When you urgently regquire the callback, well the add to the solutions above a method which communicate with the host of the programming language with a name like ValueOf(string). So you can write
ValueOf("A") > ValueOf("B") - 10
Have fun.

http://code.google.com/p/bc-expression/
Handles variable lookup via a lambda or block callback.
Understands numeric, string and boolean constants.
Unary operators + - !
Operators || && < <= == != >= > + - * / %
Grouping with ( )
Raises an Expression::SyntaxError if there's a syntax error.

Performance of delegate and method group

I was investigating the performance hit of creating Cachedependency objects, so I wrote a very simple test program as follows:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Web.Caching;
namespace Test
{
internal class Program
{
private static readonly string[] keys = new[] {"Abc"};
private static readonly int MaxIteration = 10000000;
private static void Main(string[] args)
{
Debug.Print("first set");
test2();
test3();
test4();
test5();
test6();
test7();
Debug.Print("second set");
test7();
test6();
test5();
test4();
test3();
test2();
}
private static void test2()
{
DateTime start = DateTime.Now;
var list = new List<CacheDependency>();
for (int i = 0; i < MaxIteration; i++)
{
list.Add(new CacheDependency(null, keys));
}
Debug.Print("test2 Time: " + (DateTime.Now - start));
}
private static void test3()
{
DateTime start = DateTime.Now;
var list = new List<Func<CacheDependency>>();
for (int i = 0; i < MaxIteration; i++)
{
list.Add(() => new CacheDependency(null, keys));
}
Debug.Print("test3 Time: " + (DateTime.Now - start));
}
private static void test4()
{
var p = new Program();
DateTime start = DateTime.Now;
var list = new List<Func<CacheDependency>>();
for (int i = 0; i < MaxIteration; i++)
{
list.Add(p.GetDep);
}
Debug.Print("test4 Time: " + (DateTime.Now - start));
}
private static void test5()
{
var p = new Program();
DateTime start = DateTime.Now;
var list = new List<Func<CacheDependency>>();
for (int i = 0; i < MaxIteration; i++)
{
list.Add(() => { return p.GetDep(); });
}
Debug.Print("test5 Time: " + (DateTime.Now - start));
}
private static void test6()
{
DateTime start = DateTime.Now;
var list = new List<Func<CacheDependency>>();
for (int i = 0; i < MaxIteration; i++)
{
list.Add(GetDepStatic);
}
Debug.Print("test6 Time: " + (DateTime.Now - start));
}
private static void test7()
{
DateTime start = DateTime.Now;
var list = new List<Func<CacheDependency>>();
for (int i = 0; i < MaxIteration; i++)
{
list.Add(() => { return GetDepStatic(); });
}
Debug.Print("test7 Time: " + (DateTime.Now - start));
}
private CacheDependency GetDep()
{
return new CacheDependency(null, keys);
}
private static CacheDependency GetDepStatic()
{
return new CacheDependency(null, keys);
}
}
}
But I can't understand why these result looks like this:
first set
test2 Time: 00:00:08.5394884
test3 Time: 00:00:00.1820105
test4 Time: 00:00:03.1401796
test5 Time: 00:00:00.1910109
test6 Time: 00:00:02.2041261
test7 Time: 00:00:00.4840277
second set
test7 Time: 00:00:00.1850106
test6 Time: 00:00:03.2941884
test5 Time: 00:00:00.1750100
test4 Time: 00:00:02.3561347
test3 Time: 00:00:00.1830105
test2 Time: 00:00:07.7324423
In particular:
Why is test4 and test6 much slower
than their delegate version? I also
noticed that Resharper specifically
has a comment on the delegate
version suggesting change test5 and
test7 to "Convert to method group".
Which is the same as test4 and test6
but they're actually slower?
I don't seem a consistent
performance difference when calling
test4 and test6, shouldn't static
calls to be always faster?

In tests with method group (4,6) C# compiler doesn't cache delegate (Func) object. It creates new every time. In 7 and 5 it caches Action object to generated method that calls your methods. So creation of Funcs from method groups is very slow (coz of Heap allocation), but calling is fast as action points directly on you method. And creation of actions from lambdas is fast as Func is cached but it points to generated method so there is one unnecessary method call.
Beware that not all lambdas can be cached (closures break this logic)

I haven't looked too far into your code, but first step would be to switch things over to use the StopWatch class instead of DateTime.Now etc.
http://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.aspx

In C# 11 the language spec was changed to allow the compiler to legally cache the delegate.
https://github.com/dotnet/roslyn/issues/5835
If you're using that version of C# or newer, you won't see allocations when passing a method group where the delegate can be cached.

That is quite interesting. I'm wondering if your million entry lists aren't causing a garbage collection and skewing your results. Try changing the order these functions are called in and see what the results give you.
Another thing is that the JIT might have optimised your code to not create the lambda each time and is just inserting the same value over and over. Might be worth running ildasm over it and see what is actually generated.

Why is test4 and test6 much slower than their delegate version? I also noticed that Resharper specifically has a comment on the delegate version suggesting change test5 and test7 to "Covert to method group". Which is the same as test4 and test6 but they're actually slower?
You'll get a big clue by adding
Debug.Print(ReferenceEquals(list[0], list[1]) ? "same" : "different");
to the end of each method.
With the delegate version, the Func gets compiled a bit like it was actually:
var func = Func<CacheDependency> <>_hiddenfieldwithinvalidC#name;
if (func == null)
{
<>_hiddenfieldwithinvalidC#name = func = () => p.GetDep();
}
While with a method group it gets compiled much the same as:
func = new Func<CacheDependency>(p.GetDep());
This memoisation is done a lot with delegates created from lambdas when the compiler can determine it is safe to do so, but not with method-groups being cast to delegates, and the performance differences you see show exactly why.
I don't seem a consistent performance difference when calling test4 and test6, shouldn't static calls to be always faster?
Not necessarily. While a static call has the advantage of one less argument to pass around (as there's no implicit this argument), this difference:
Isn't much to begin with.
Might be jitted away if this isn't used.
Might be optimised away in that the register with the this pointer in before the call is the register with the this pointer in after the call, so there's no need to actually do anything to get it in there.
Eh, something else. I'm not claiming this list is exhaustive.
Really what performance benefits there are from static is more that if you do what is naturally static in instance methods you can end up with excessive passing around of objects that isn't really needed and wastes time. That said, if you are doing what is naturally instance in static methods you can end up storing/retrieving and/or allocationg and/or passing around objects in arguments you wouldn't need to and be just as bad.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.