Please note this question related to performance only. Lets skip design guidelines, philosophy, compatibility, portability and anything what is not related to pure performance. Thank you.
Now to the question. I always assumed that because C# getters/setters are really methods in disguise then reading public field must be faster than calling a getter.
So to make sure I did a test (the code below). However this test only produces expected results (ie fields are faster than getters at 34%) if you run it from inside Visual Studio.
Once you run it from command line it shows pretty much the same timing...
The only explanation could be is that the CLR does additional optimisation (correct me if I am wrong here).
I do not believe that in real application where those properties being used in much more sophisticated way they will be optimised in the same way.
Please help me to prove or disprove the idea that in real life properties are slower than fields.
The question is - how should I modify the test classes to make the CLR change behaviour so the public field outperfroms the getters. OR show me that any property without internal logic will perform the same as a field (at least on the getter)
EDIT: I am only talking about Release x64 build.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Runtime.InteropServices;
namespace PropertyVsField
{
class Program
{
static int LEN = 20000000;
static void Main(string[] args)
{
List<A> a = new List<A>(LEN);
List<B> b = new List<B>(LEN);
Random r = new Random(DateTime.Now.Millisecond);
for (int i = 0; i < LEN; i++)
{
double p = r.NextDouble();
a.Add(new A() { P = p });
b.Add(new B() { P = p });
}
Stopwatch sw = new Stopwatch();
double d = 0.0;
sw.Restart();
for (int i = 0; i < LEN; i++)
{
d += a[i].P;
}
sw.Stop();
Console.WriteLine("auto getter. {0}. {1}.", sw.ElapsedTicks, d);
sw.Restart();
for (int i = 0; i < LEN; i++)
{
d += b[i].P;
}
sw.Stop();
Console.WriteLine(" field. {0}. {1}.", sw.ElapsedTicks, d);
Console.ReadLine();
}
}
class A
{
public double P { get; set; }
}
class B
{
public double P;
}
}
As others have already mentioned, the getters are inlined.
If you want to avoid inlining, you have to
replace the automatic properties with manual ones:
class A
{
private double p;
public double P
{
get { return p; }
set { p = value; }
}
}
and tell the compiler not to inline the getter (or both, if you feel like it):
[MethodImpl(MethodImplOptions.NoInlining)]
get { return p; }
Note that the first change does not make a difference in performance, whereas the second change shows a clear method call overhead:
Manual properties:
auto getter. 519005. 10000971,0237547.
field. 514235. 20001942,0475098.
No inlining of the getter:
auto getter. 785997. 10000476,0385552.
field. 531552. 20000952,077111.
Have a look at the Properties vs Fields – Why Does it Matter? (Jonathan Aneja) blog article from one of the VB team members on MSDN. He outlines the property versus fields argument and also explains trivial properties as follows:
One argument I’ve heard for using fields over properties is that
“fields are faster”, but for trivial properties that’s actually not
true, as the CLR’s Just-In-Time (JIT) compiler will inline the
property access and generate code that’s as efficient as accessing a
field directly.
The JIT will inline any method (not just a getter) that its internal metrics determine will be faster inlined. Given that a standard property is return _Property; it will be inlined in every case.
The reason you are seeing different behavior is that in Debug mode with a debugger attached, the JIT is significantly handicapped, to ensure that any stack locations match what you would expect from the code.
You are also forgetting the number one rule of performance, testing beats thinking. For instance even though quick sort is asymptotically faster than insertion sort, insertion sort is actually faster for extremely small inputs.
The only explanation could be is that the CLR does additional optimisation (correrct me if I am wrong here).
Yes, it is called inlining. It is done in the compiler (machine code level - i.e. JIT). As the getter/setter are trivial (i.e. very simple code) the method calls are destroyed and the getter/setter written in the surrounding code.
This does not happen in debug mode in order to support debugging (i.e. the ability to set a breakpoint in a getter or setter).
In visual studio there is no way to do that in the debugger. Compile release, run without attached debugger and you will get the full optimization.
I do not believe that in real application where those properties being used in much more sophisticated way they will be optimised in the same way.
The world is full of illusions that are wrong. They will be optimized as they are still trivial (i.e. simple code, so they are inlined).
It should be noted that it's possible to see the "real" performance in Visual Studio.
Compile in Release mode with Optimisations enabled.
Go to Debug -> Options and Settings, and uncheck "Suppress JIT optimization on module load (Managed only)".
Optionally, uncheck "Enable Just My Code" otherwise you may not be able to step in the code.
Now the jitted assembly will be the same even with the debugger attached, allowing you to step in the optimised dissassembly if you so please. This is essential to understand how the CLR optimises code.
After read all your articles, I decide to make a benchmark with these code:
[TestMethod]
public void TestFieldVsProperty()
{
const int COUNT = 0x7fffffff;
A a1 = new A();
A a2 = new A();
B b1 = new B();
B b2 = new B();
C c1 = new C();
C c2 = new C();
D d1 = new D();
D d2 = new D();
Stopwatch sw = new Stopwatch();
long t1, t2, t3, t4;
sw.Restart();
for (int i = COUNT - 1; i >= 0; i--)
{
a1.P = a2.P;
}
sw.Stop();
t1 = sw.ElapsedTicks;
sw.Restart();
for (int i = COUNT - 1; i >= 0; i--)
{
b1.P = b2.P;
}
sw.Stop();
t2 = sw.ElapsedTicks;
sw.Restart();
for (int i = COUNT - 1; i >= 0; i--)
{
c1.P = c2.P;
}
sw.Stop();
t3 = sw.ElapsedTicks;
sw.Restart();
for (int i = COUNT - 1; i >= 0; i--)
{
d1.P = d2.P;
}
sw.Stop();
t4 = sw.ElapsedTicks;
long max = Math.Max(Math.Max(t1, t2), Math.Max(t3, t4));
Console.WriteLine($"auto: {t1}, {max * 100d / t1:00.00}%.");
Console.WriteLine($"field: {t2}, {max * 100d / t2:00.00}%.");
Console.WriteLine($"manual: {t3}, {max * 100d / t3:00.00}%.");
Console.WriteLine($"no inlining: {t4}, {max * 100d / t4:00.00}%.");
}
class A
{
public double P { get; set; }
}
class B
{
public double P;
}
class C
{
private double p;
public double P
{
get => p;
set => p = value;
}
}
class D
{
public double P
{
[MethodImpl(MethodImplOptions.NoInlining)]
get;
[MethodImpl(MethodImplOptions.NoInlining)]
set;
}
}
When test in debug mode, I got this result:
auto: 35142496, 100.78%.
field: 10451823, 338.87%.
manual: 35183121, 100.67%.
no inlining: 35417844, 100.00%.
but when switch to release mode, the result is different than before.
auto: 2161291, 873.91%.
field: 2886444, 654.36%.
manual: 2252287, 838.60%.
no inlining: 18887768, 100.00%.
seems auto property is a better way.
Related
I have a situation where many objects of different types are constructed at the start of an App and are then invoked together using an abstract method.
Many of these objects have 1, sometimes 2, parameters that can be determined during construction. They affect part of what this object would do during invoking.
So, as I understand, when objects[i].Invoke (except in my case it's not just an array) is called, first the virtual table gets dereferenced. Then there is an additional if check.
Since the virtual table is a must, shouldn't it be faster to have two implementations of Invoke method to get rid of if? (memory usage isn't a concern)
Now, code duplication isn't nice. But I know that if I have generic MyType<T>, JIT will generate one version of MyType (and all its methods) for all reference types T, and one version per every value type T (even if they have the same size). I see it as a way to force JIT into generating two different Invoke functions, that only differ in the part where the if statement was.
I made a speed test, to make sure I'm not missing anything. And so far it seems I am:
using System.Diagnostics;
using System.Runtime.CompilerServices;
var test_size = 1000000;
var exp_res = new int[test_size];
var got_res = new int[test_size];
var oa = new Base[test_size,3];
var rng = new Random();
for (int i = 0; i < test_size; i++)
{
var b = rng.Next(2)==0;
exp_res[i] = !b ? Base.CalcRes1() : Base.CalcRes2();
oa[i,0] = !b ? new T1_1() : new T1_2();
oa[i,1] = new T2(b);
oa[i,2] = !b ? new T3<T3_h1>() : new T3<T3_h2>();
}
var sw = new int[3].Select(_=>new Stopwatch()).ToArray();
//[MethodImpl(MethodImplOptions.AggressiveInlining)]
void test_step()
{
{
// Shuffle random row, so that "oa[i,1].f1()" can't be
// optimized into "((t2)(oa[i,1])).Func1()",
// turning virtual call into non-virtual
var i = rng.Next(test_size);
var temp = oa[i, 0];
oa[i, 0] = oa[i, 1];
oa[i, 1] = oa[i, 2];
oa[i, 2] = temp;
}
for (int sw_i = 0; sw_i < sw.Length; sw_i++)
{
got_res.Initialize(); // Reset to zero
sw[sw_i].Start();
for (int i = 0; i < test_size; i++)
got_res[i] = oa[i, sw_i].Func1();
sw[sw_i].Stop();
// Test sanity + ensure "got_res[i] =" isn't optimized out
if (!got_res.SequenceEqual(exp_res)) throw new Exception();
}
}
// Dry run, to JIT-compile everything
// 10 times just in case
for (int i = 0; i < 10; i++) test_step();
for (int sw_i = 0; sw_i < sw.Length; sw_i++)
sw[sw_i].Reset();
// Testing 1000 times, to average out the background noise
var test_count = 1000;
for (int i = 1; i <= test_count; i++)
{
test_step();
Console.Title = (i/(float)test_count).ToString("P");
Console.SetCursorPosition(0, 0);
for (int sw_i = 0; sw_i < sw.Length; sw_i++)
Console.WriteLine(sw[sw_i].Elapsed);
}
void WriteMHnd(Type t) =>
Console.WriteLine(t.GetMethod("Func1")?.MethodHandle.Value.ToString("X"));
WriteMHnd(typeof(T3<T3_h1>));
WriteMHnd(typeof(T3<T3_h1>)); // Same
WriteMHnd(typeof(T3<T3_h2>)); // Different
/**
// First 2 as the same
WriteMHnd(typeof(TT<Exception>));
WriteMHnd(typeof(TT<T2>));
// Second 2 are both new methods
WriteMHnd(typeof(TT<byte>));
WriteMHnd(typeof(TT<int>));
class TT<T>
{
public void Func1() { }
};
/**/
abstract class Base
{
[MethodImpl(MethodImplOptions.NoInlining)]
public static int CalcRes1() => 5;
[MethodImpl(MethodImplOptions.NoInlining)]
public static int CalcRes2() => 9;
abstract public int Func1();
}
// Test 1: Manually implement both possibilities
sealed class T1_1 : Base
{
sealed override public int Func1() => CalcRes1();
}
sealed class T1_2 : Base
{
sealed override public int Func1() => CalcRes2();
}
// Test 2: Save choice into the field
sealed class T2 : Base
{
public bool b; // Isn't even marked as readonly
public T2(bool b) => this.b = b;
sealed public override int Func1() =>
!b ? CalcRes1() : CalcRes2();
}
// Test 3: Use generic magic, causing JIT to generate
// two versions of the "Func1" function, like in case of Test 1
interface IT3_h
{
bool GetVal();
}
struct T3_h1 : IT3_h
{
bool IT3_h.GetVal() => false;
}
struct T3_h2 : IT3_h
{
bool IT3_h.GetVal() => true;
}
sealed class T3<TVal> : Base
where TVal : struct, IT3_h
{
// Doesn't change anything
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
sealed public override int Func1() =>
!new TVal().GetVal() ? CalcRes1() : CalcRes2();
}
(targeting .Net6.0)
Immediately something's not right, because T3 consistently takes ~1% less time than T1. I would expect JIT to generate exactly the same classes and methods as in the case of T1. Instead, it somehow found a way to optimize the generic case more than the direct one.
But at least new TVal().GetVal() does get turned into a constant, otherwise T3 would be much slower.
But, contrary to my belief, T2 takes ~15% less time than T3. How?
My main guess was that in the case of T2 the virtual table is somehow skipped. But the row shuffling at the start of test_step doesn't seem to have an effect.
I also played around with [MethodImpl], but didn't find anything interesting.
It seems that the simple solution of T2 is somehow smarter than T3. But I doubt JIT can somehow remove the if from T2.Func1, so it should be possible to find a better solution than both T2 and T3 if I can understand what is happening here.
This question is similar to this one, but assuming that we know the member name at compile time.
Assuming that we have a class
public class MyClass
{
public string TheProperty { get; set; }
}
and in another method, we want to set the TheProperty member of an instance of that class, but we don't know the type of the instance at compile time, we only know the property name at compile time.
So, as I see it, there are two ways to do that now:
object o = new MyClass(); // For simplicity.
o.GetType().GetProperty("TheProperty").SetValue(o, "bar"); // (1)
((dynamic) o).TheProperty = "bar"; // (2)
I measured this test case using the System.Diagnostics.Stopwatch class to find out that reflection took 475 ticks and the way using dynamic took 0 ticks, therefore being about as fast as a direct call to new MyClass().TheProperty = "bar".
Since I have almost never seen the second way, I am a little confused and my questions now are:
Is there a lapse of thought or anything?
Should the second way be preferred over the first or the other way around? I don't see any disadvantages of using the second way; both (1) and (2) would throw exceptions if the property would not have been found, wouldn't they?
Why does the second way seem to be used so rarely even though seemingly being the faster?
(...)reflection took 475 ticks and the way using dynamic took 0 ticks(...)
That is simply false. The problem is that you are not understanding how dynamic works. I will assume you are correctly setting up the benchmark:
Running in Release mode with optimizations turned on and without the debugger.
You are jitting the methods before actually measuring times.
And here comes the key part you are probably not doing:
Jit the dynamic test without actually performing the dynamic runtime binding.
And why is 3 important? Because the runtime will cache the dynamic call and reuse it! So in a naive benchmark implementation, if you are doing things right, you will incurr the cost of the initial dynamic call jitting the method and therefore you won't measure it.
Run the following benchmark:
public static void Main(string[] args)
{
var repetitions = 1;
var isWarmup = true;
var foo = new Foo();
//warmup
SetPropertyWithDynamic(foo, isWarmup); //JIT method without caching the dynamic call
SetPropertyWithReflection(foo); //JIT method
var s = ((dynamic)"Hello").Substring(0, 2); //Start up the runtime compiler
for (var test = 0; test < 10; test++)
{
Console.WriteLine($"Test #{test}");
var watch = Stopwatch.StartNew();
for (var i = 0; i < repetitions; i++)
{
SetPropertyWithDynamic(foo);
}
watch.Stop();
Console.WriteLine($"Dynamic benchmark: {watch.ElapsedTicks}");
watch = Stopwatch.StartNew();
for (var i = 0; i < repetitions; i++)
{
SetPropertyWithReflection(foo);
}
watch.Stop();
Console.WriteLine($"Reflection benchmark: {watch.ElapsedTicks}");
}
Console.WriteLine(foo);
Console.ReadLine();
}
static void SetPropertyWithDynamic(object o, bool isWarmup = false)
{
if (isWarmup)
return;
((dynamic)o).TheProperty = 1;
}
static void SetPropertyWithReflection(object o)
{
o.GetType().GetProperty("TheProperty").SetValue(o, 1);
}
public class Foo
{
public int TheProperty { get; set; }
public override string ToString() => $"Foo: {TheProperty}";
}
Spot the difference between the first run and the subsequent ones?
I just built dynamic method - see below (thanks to the fellow SO users). It appears that the Func created as a dynamic method with IL injection 2x slower than the lambda.
Anyone knows why exactly?
(EDIT : this was built as Release x64 in VS2010. Please run it from console not from inside Visual Studio F5.)
class Program
{
static void Main(string[] args)
{
var mul1 = IL_EmbedConst(5);
var res = mul1(4);
Console.WriteLine(res);
var mul2 = EmbedConstFunc(5);
res = mul2(4);
Console.WriteLine(res);
double d, acc = 0;
Stopwatch sw = new Stopwatch();
for (int k = 0; k < 10; k++)
{
long time1;
sw.Restart();
for (int i = 0; i < 10000000; i++)
{
d = mul2(i);
acc += d;
}
sw.Stop();
time1 = sw.ElapsedMilliseconds;
sw.Restart();
for (int i = 0; i < 10000000; i++)
{
d = mul1(i);
acc += d;
}
sw.Stop();
Console.WriteLine("{0,6} {1,6}", time1, sw.ElapsedMilliseconds);
}
Console.WriteLine("\n{0}...\n", acc);
Console.ReadLine();
}
static Func<int, int> IL_EmbedConst(int b)
{
var method = new DynamicMethod("EmbedConst", typeof(int), new[] { typeof(int) } );
var il = method.GetILGenerator();
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldc_I4, b);
il.Emit(OpCodes.Mul);
il.Emit(OpCodes.Ret);
return (Func<int, int>)method.CreateDelegate(typeof(Func<int, int>));
}
static Func<int, int> EmbedConstFunc(int b)
{
return a => a * b;
}
}
Here is the output (for i7 920)
20
20
25 51
25 51
24 51
24 51
24 51
25 51
25 51
25 51
24 51
24 51
4.9999995E+15...
============================================================================
EDIT EDIT EDIT EDIT
Here is the proof of that dhtorpe was right - more complex lambda will lose its advantage.
Code to prove it (this demonstrate that Lambda has exactly the same performance with IL injection):
class Program
{
static void Main(string[] args)
{
var mul1 = IL_EmbedConst(5);
double res = mul1(4,6);
Console.WriteLine(res);
var mul2 = EmbedConstFunc(5);
res = mul2(4,6);
Console.WriteLine(res);
double d, acc = 0;
Stopwatch sw = new Stopwatch();
for (int k = 0; k < 10; k++)
{
long time1;
sw.Restart();
for (int i = 0; i < 10000000; i++)
{
d = mul2(i, i+1);
acc += d;
}
sw.Stop();
time1 = sw.ElapsedMilliseconds;
sw.Restart();
for (int i = 0; i < 10000000; i++)
{
d = mul1(i, i + 1);
acc += d;
}
sw.Stop();
Console.WriteLine("{0,6} {1,6}", time1, sw.ElapsedMilliseconds);
}
Console.WriteLine("\n{0}...\n", acc);
Console.ReadLine();
}
static Func<int, int, double> IL_EmbedConst(int b)
{
var method = new DynamicMethod("EmbedConstIL", typeof(double), new[] { typeof(int), typeof(int) });
var log = typeof(Math).GetMethod("Log", new Type[] { typeof(double) });
var il = method.GetILGenerator();
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldc_I4, b);
il.Emit(OpCodes.Mul);
il.Emit(OpCodes.Conv_R8);
il.Emit(OpCodes.Ldarg_1);
il.Emit(OpCodes.Ldc_I4, b);
il.Emit(OpCodes.Mul);
il.Emit(OpCodes.Conv_R8);
il.Emit(OpCodes.Call, log);
il.Emit(OpCodes.Sub);
il.Emit(OpCodes.Ret);
return (Func<int, int, double>)method.CreateDelegate(typeof(Func<int, int, double>));
}
static Func<int, int, double> EmbedConstFunc(int b)
{
return (a, z) => a * b - Math.Log(z * b);
}
}
The constant 5 was the cause. Why on earth could that be? Reason: When the JIT knows the constant is 5 it does not emit an imul instruction but a lea [rax, rax * 4]. This is a well-known assembly-level optimization. But for some reason, this code executed slower. The optimization was a pessimization.
And the C# compiler emitting a closure prevented the JIT from optimizing the code in that particular way.
Proof: Change the constant to 56878567 and the performance changes. When inspecting the JITed code you can see that an imul is used now.
I managed to catch this by hardcoding the constant 5 into the lambda like this:
static Func<int, int> EmbedConstFunc2(int b)
{
return a => a * 5;
}
This allowed me to inspect the JITed x86.
Sidenote: The .NET JIT does not inline delegate calls in any way. Just mentioning this because it was falsely speculated this was the case in the comments.
Sidenode 2: In order to receive the full JIT optimization level you need to compile in Release mode and start without debugger attached. The debugger prevents optimizations from being performed, even in Release mode.
Sidenote 3: Although EmbedConstFunc contains a closure and normally would be slower than the dynamically generated method the effect of this "lea"-optimization does more damage and eventually is slower.
lambda is not faster than DynamicMethod. It is based on. However, static method is faster than instance method but delegate create for static method is slower than delegate create for instance method. Lambda expression build a static method but use it like instance method by adding as first paameter a "Closure". Delegate to static method "pop" stack to get rid of non needed "this" instance before "mov" to real "IL body". in case of delegate for instance method "IL body" is directly hit. This is why a delegate to an hypotetic static method build by lambda expression is a faster (maybe a side effect of delegate pattern code sharing beetween instance/static method)
The performance issue can be avoid by adding an unused first argument (Closure type for example) to DynamicMethod and call CreateDelegate with explicit target instance (null can be used).
var myDelegate = DynamicMethod.CreateDelegate(MyDelegateType, null) as MyDelegateType;
http://msdn.microsoft.com/fr-fr/library/z43fsh67(v=vs.110).aspx
Tony THONG
Given that the performance difference exists only when running in release mode without a debugger attached, the only explanation I can think of is that the JIT compiler is able to make native code optimizations for the lambda expression that it is not able to perform for the emitted IL dynamic function.
Compiling for release mode (optimizations on) and running without the debugger attached, the lambda is consistently 2x faster than the generated IL dynamic method.
Running the same release-mode optimized build with a debugger attached to the process drops the lambda performance to comparable or worse than the generated IL dynamic method.
The only difference between these two runs is in the behavior of the JIT. When a process is being debugged, the JIT compiler suppresses a number of native code gen optimizations to preserve native instruction to IL instruction to source code line number mappings and other correlations that would be trashed by aggressive native instruction optimizations.
A compiler can only apply special case optimizations when the input expression graph (in this case, IL code) matches certain very specific patterns and conditions. The JIT compiler clearly has special knowledge of the lambda expression IL code pattern and is emitting different code for lambdas than for "normal" IL code.
It is quite possible that your IL instructions do not exactly match the pattern that causes the JIT compiler to optimize the lambda expression. For example, your IL instructions encode the B value as an inline constant, whereas the analogous lambda expression loads a field from an internal captured variable object instance. Even if your generated IL were to mimic the captured field pattern of the C# compiler generated lambda expression IL, it still might not be "close enough" to receive the same JIT treatment as the lambda expression.
As mentioned in the comments, this may well be due to inlining of the lambda to eliminate the call/return overhead. If this is the case, I would expect to see this difference in performance disappear in more complex lambda expressions, since inlining is usually reserved for only the simplest of expressions.
I am currently implementing a runtime (i.e. a collection of functions) for a formulas language. Some formulas need a context to be passed to them and I created a class called EvaluationContext which contains all properties I need access to at runtime.
Using ThreadLocal<EvaluationContext> seems like a good option to make this context available to the runtime functions. The other option is to pass the context as a parameter to the functions that need it.
I prefer using ThreadLocal but I was wondering if there is any performance penalty as opposed to passing the evaluation context via method parameters.
I created the program below and it is faster to use parameters rather than the ThreadLocal field.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace TestThreadLocal
{
internal class Program
{
public class EvaluationContext
{
public int A { get; set; }
public int B { get; set; }
}
public static class FormulasRunTime
{
public static ThreadLocal<EvaluationContext> Context = new ThreadLocal<EvaluationContext>();
public static int SomeFunction()
{
EvaluationContext ctx = Context.Value;
return ctx.A + ctx.B;
}
public static int SomeFunction(EvaluationContext context)
{
return context.A + context.B;
}
}
private static void Main(string[] args)
{
Stopwatch stopwatch = Stopwatch.StartNew();
int N = 10000;
Task<int>[] tasks = new Task<int>[N];
int sum = 0;
for (int i = 0; i < N; i++)
{
int x = i;
tasks[i] = Task.Factory.StartNew(() =>
{
//Console.WriteLine("Starting {0}, thread {1}", x, Thread.CurrentThread.ManagedThreadId);
FormulasRunTime.Context.Value = new EvaluationContext {A = 0, B = x};
return FormulasRunTime.SomeFunction();
});
sum += i;
}
Task.WaitAll(tasks);
Console.WriteLine("Using ThreadLocal: It took {0} millisecs and the sum is {1}", stopwatch.ElapsedMilliseconds, tasks.Sum(t => t.Result));
Console.WriteLine(sum);
stopwatch = Stopwatch.StartNew();
for (int i = 0; i < N; i++)
{
int x = i;
tasks[i] = Task.Factory.StartNew(() =>
{
return FormulasRunTime.SomeFunction(new EvaluationContext { A = 0, B = x });
});
}
Task.WaitAll(tasks);
Console.WriteLine("Using parameter: It took {0} millisecs and the sum is {1}", stopwatch.ElapsedMilliseconds, tasks.Sum(t => t.Result));
Console.ReadKey();
}
}
}
Going on costa's answer;
If you try N as 10000000,
int N = 10000000;
you will see there is not much of a difference (around 107.4 to 103.4 seconds).
If the value gets bigger the difference becomes smaller.
So, if you do not mind a three seconds slowness, i think it is the difference between the usability and the taste.
PS: In the code, int return types must be converted to long.
I consider the ThreadLocal design to be dirty, yet creative. It is definitely going to be faster to use parameters but performance should not be your only concern. Parameters will be much clearer to understand. I recommend you go with parameters.
There will not be any performance impact, but you will not be able to do any parallel computations in this case (which can be quite useful especially in formulas domain).
If you definitely don't want to do it you can go for ThreadLocal.
Otherwise I would suggest you look at the "state monad" "pattern" that will allow you to seamlessly pass your state (context) through your computations (formulas) without having any explicit parameters.
I think you'll find that in a head-to-head comparison, accessing a ThreadLocal<> takes substantially longer than accessing a parameter, but in the end it might not be a significant difference - it all depends what else you're doing.
This is more of an academic question about performance than a realistic 'what should I use' but I'm curious as I don't dabble much in IL at all to see what's constructed and I don't have a large dataset on hand to profile against.
So which is faster:
List<myObject> objs = SomeHowGetList();
List<string> strings = new List<string>();
foreach (MyObject o in objs)
{
if (o.Field == "something")
strings.Add(o.Field);
}
or:
List<myObject> objs = SomeHowGetList();
List<string> strings = new List<string>();
string s;
foreach (MyObject o in objs)
{
s = o.Field;
if (s == "something")
strings.Add(s);
}
Keep in mind that I don't really want to know the performance impact of the string.Add(s) (as whatever operation needs to be done can't really be changed), just the performance difference between setting s each iteration (let's say that s can be any primitive type or string) verses calling the getter on the object each iteration.
Your first option is noticeably faster in my tests. I'm such flip flopper! Seriously though, some comments were made about the code in my original test. Here's the updated code that shows option 2 being faster.
class Foo
{
public string Bar { get; set; }
public static List<Foo> FooMeUp()
{
var foos = new List<Foo>();
for (int i = 0; i < 10000000; i++)
{
foos.Add(new Foo() { Bar = (i % 2 == 0) ? "something" : i.ToString() });
}
return foos;
}
}
static void Main(string[] args)
{
var foos = Foo.FooMeUp();
var strings = new List<string>();
Stopwatch sw = Stopwatch.StartNew();
foreach (Foo o in foos)
{
if (o.Bar == "something")
{
strings.Add(o.Bar);
}
}
sw.Stop();
Console.WriteLine("It took {0}", sw.ElapsedMilliseconds);
strings.Clear();
sw = Stopwatch.StartNew();
foreach (Foo o in foos)
{
var s = o.Bar;
if (s == "something")
{
strings.Add(s);
}
}
sw.Stop();
Console.WriteLine("It took {0}", sw.ElapsedMilliseconds);
Console.ReadLine();
}
Most of the time, your second code snippet should be at least as fast as the first snippet.
These two code snippets are not functionally equivalent. Properties are not guaranteed to return the same result across individual accesses. As a consequence, the JIT optimizer is not able to cache the result (except for trivial cases) and it will be faster if you cache the result of a long running property. Look at this example: why foreach is faster than for loop while reading richtextbox lines.
However, for some specific cases like:
for (int i = 0; i < myArray.Length; ++i)
where myArray is an array object, the compiler is able to detect the pattern and optimize the code and omit the bound checks. It might be slower if you cache the result of Length property like:
int len = myArray.Length;
for (int i = 0; i < myArray.Length; ++i)
It really depends on the implementation. In most cases, it is assumed (as a matter of common practice / courtesy) that a property is inexpensive. However, it could that each "get" does a non-cached search over some remote resource. For standard, simple properties, you'll never notice a real difference between the two. For the worst-case, fetch-once, store and re-use will be much faster.
I'd be tempted to use get twice until I know there is a problem... "premature optimisation", etc... But; if I was using it in a tight loop, then I might store it in a variable. Except for Length on an array, which has special JIT treatment ;-p
Generally the second one is faster, as the first one recalculates the property on each iteration.
Here is an example of something that could take significant amount of time:
var d = new DriveInfo("C:");
d.VolumeLabel; // will fetch drive label on each call
Storing the value in a field is the faster option.
Although a method call doesn't impose a huge overhead, it far outweighs storing the value once to a local variable on the stack and then retrieving it.
I for one do it consistently.