C#: Why is a function call faster than manual inlining?

C#: Why is a function call faster than manual inlining? - c#

I have measured the execution time for two ways of calculating the power of 2:
1) Inline
result = b * b;
2) With a simple function call
result = Power(b);
When running in Debug mode, everything is as expected: Calling a function is considerably more expensive than doing the calculation in line (385 ms in line vs. 570 ms function call).
In release mode, I'd expect the compiler to speed up execution time of the function call considerably because the compiler would inline internally the very small Power() function. But I'd NOT expect the function call to be FASTER than the manual inlined calculation.
Most astonishingly this is the case: In the release build, the first run needs 109 ms and the second run with the call to Power() needs only 62 ms.
How can a function call be faster than manual inlining?
Here is the program for your reproduction:
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Starting Test");
// 1. Calculating inline without function call
Stopwatch sw = Stopwatch.StartNew();
for (double d = 0; d < 100000000; d++)
{
double res = d * d;
}
sw.Stop();
Console.WriteLine("Checked: " + sw.ElapsedMilliseconds);
// 2. Calulating power with function call
Stopwatch sw2 = Stopwatch.StartNew();
for (int d = 0; d < 100000000; d++)
{
double res = Power(d);
}
sw2.Stop();
Console.WriteLine("Function: " + sw2.ElapsedMilliseconds);
Console.ReadKey();
}
static double Power(double d)
{
return d * d;
}
}

Your test is wrong. In the second part you use a int d instead of a double. Maybe it explains the time difference.

As Xavier correctly spotted, you are using double in one loop and int in the other. Changing both to the same type will make the results the same - I tested it.
Furthermore: What you are really measuring here is the duration of additions and comparisons. You are not measuring the duration of the squaring of d, because it simply is not happening: In a release build, the optimizer completely removes the body of the loop, because the result is not used. You can confirm this by commenting out the body of the loop. The duration will be the same.

Daniel Hilgarth is right, calculation is not happening at all as the result of is not used (which is probably not the case in Debug mode). Try the following example and you'll get correct results:
static void Main(string[] args)
{
Console.WriteLine("Starting Test");
var list = new List<int>();
// 1. Calculating inline without function call
Stopwatch sw = Stopwatch.StartNew();
for (int d = 0; d < 100000000; d++)
{
int res = d * d;
list.Add(res);
}
sw.Stop();
Console.WriteLine("Checked: " + sw.ElapsedMilliseconds);
// 2. Calulating power with function call
list = new List<int>();
Stopwatch sw2 = Stopwatch.StartNew();
for (int d = 0; d < 100000000; d++)
{
int res = Power(d);
list.Add(res);
}
sw2.Stop();
Console.WriteLine("Function: " + sw2.ElapsedMilliseconds);
Console.ReadKey();
}

Related

C# vs C++ for loop performance measurment

For kicks, I wanted to see how the speed of a C# for-loop compares with that of a C++ for-loop. My test is to simply iterate over a for-loop 100000 times, 100000 times, and average the result.
Here is my C# implementation:
static void Main(string[] args) {
var numberOfMeasurements = 100000;
var numberOfLoops = 100000;
var measurements = new List < long > ();
var stopwatch = new Stopwatch();
for (var i = 0; i < numberOfMeasurements; i++) {
stopwatch.Start();
for (int j = 0; j < numberOfLoops; j++) {}
measurements.Add(stopwatch.ElapsedMilliseconds);
}
Console.WriteLine("Average runtime = " + measurements.Average() + " ms.");
Console.Read();
}
Result: Average runtime = 10301.92929 ms.
Here is my C++ implementation:
void TestA()
{
auto numberOfMeasurements = 100000;
auto numberOfLoops = 100000;
std::vector<long> measurements;
for (size_t i = 0; i < numberOfMeasurements; i++)
{
auto start = clock();
for (size_t j = 0; j < numberOfLoops; j++){}
auto duration = start - clock();
measurements.push_back(duration);
}
long avg = std::accumulate(measurements.begin(), measurements.end(), 0.0) / measurements.size();
std::cout << "TestB: Time taken in milliseconds: " << avg << std::endl;
}
int main()
{
TestA();
return 0;
}
Result: TestA: Time taken in milliseconds: 0
When I had a look at what was in measurements, I noticed that it was filled with zeros... So, what is it, what is the problem here? Is it clock? Is there a better/correct way to measure the for-loop?

There is no "problem". Being able to optimize away useless code is one of the key features of C++. As the inner loop does nothing, it should be removed by every sane compiler.
Tip of the day: Only profile meaningful code that does something.
If you want to learn something about micro-benchmarks, you might be interested in this.

As "Baum mit Augen" already said the compiler will remove code that doesn't do anything. That is a common mistake when "benchmarking" C++ code. The same thing will happen if you create some kind of benchmark function which just calculates some things that will never used (won't be returned or used otherwise in code) - the compiler will just remove it.
You can avoid this behavior by not using optimize flags like O2, Ofast and so on. Since nobody would do that with real code it won't display the real performance of C++.
TL;DR Just benchmark real production code.

Array.Sort() performance drop when sorting class instances instead of floats

Array.Sort in C# is really fast if you sort floats, I need some extra data to go along with those floats so I made a simple class and extended the IComparable interface. Now all of a sudden Array.Sort is around 3-4 times slower, why is this and how can I improve the performance?
Demo code:
using System;
using System.Diagnostics;
using System.Linq;
namespace SortTest
{
class Program
{
static void Main(string[] args)
{
int arraySize = 10000;
int loops = 500;
double normalFloatTime = 0;
double floatWithIDTime = 0;
double structTime = 0;
double arraySortOverloadTime = 0;
bool floatWithIDCorrect = true;
bool structCorrect = true;
bool arraySortOverloadCorrect = true;
//just so we know the program is busy
Console.WriteLine("Sorting random arrays, this will take some time...");
Random random = new Random();
Stopwatch sw = new Stopwatch();
for (int i = 0; i < loops; i++)
{
float[] normalFloatArray = new float[arraySize];
SortTest[] floatWithIDArray = new SortTest[arraySize];
SortStruct[] structArray = new SortStruct[arraySize];
SortTest[] arraySortOverloadArray = new SortTest[arraySize];
//fill the arrays
for (int j = 0; j < arraySize; j++)
{
normalFloatArray[j] = NextFloat(random);
floatWithIDArray[j] = new SortTest(normalFloatArray[j], j);
structArray[j] = new SortStruct(normalFloatArray[j], j);
arraySortOverloadArray[j] = new SortTest(normalFloatArray[j], j);
}
//Reset stopwatch from any previous state
sw.Reset();
sw.Start();
Array.Sort(normalFloatArray);
sw.Stop();
normalFloatTime += sw.ElapsedTicks;
//Reset stopwatch from any previous state
sw.Reset();
sw.Start();
Array.Sort(floatWithIDArray);
sw.Stop();
floatWithIDTime += sw.ElapsedTicks;
//Reset stopwatch from any previous state
sw.Reset();
sw.Start();
Array.Sort(structArray);
sw.Stop();
structTime += sw.ElapsedTicks;
//Reset stopwatch from any previous state
sw.Reset();
sw.Start();
Array.Sort(arraySortOverloadArray.Select(k => k.ID).ToArray(), arraySortOverloadArray);
sw.Stop();
arraySortOverloadTime += sw.ElapsedTicks;
for (int k = 0; k < normalFloatArray.Length; k++)
{
if (normalFloatArray[k] != floatWithIDArray[k].SomeFloat)
{
floatWithIDCorrect = false;
}
if (normalFloatArray[k] != structArray[k].SomeFloat)
{
structCorrect = false;
}
if (normalFloatArray[k] != arraySortOverloadArray[k].SomeFloat)
{
arraySortOverloadCorrect = false;
}
}
}
//calculate averages
double normalFloatAverage = normalFloatTime / loops;
double floatWithIDAverage = floatWithIDTime / loops;
double structAverage = structTime / loops;
double arraySortOverloadAverage = arraySortOverloadTime / loops;
//print averages
Console.WriteLine("normalFloatAverage: {0} ticks.\nfloatWithIDAverage: {1} ticks.\nstructAverage: {2} ticks.\narraySortOverloadAverage: {3} ticks.", normalFloatAverage, floatWithIDAverage, structAverage, arraySortOverloadAverage);
Console.WriteLine("floatWithIDArray has " + (floatWithIDCorrect ? "" : "NOT ") + "been sorted correctly atleast once.");
Console.WriteLine("structArray has " + (structCorrect ? "" : "NOT ") + "been sorted correctly atleast once.");
Console.WriteLine("arraySortOverloadArray has " + (arraySortOverloadCorrect ? "" : "NOT ") + "been sorted correctly atleast once.");
Console.WriteLine("Press enter to exit.");
//pause so we can see the console
Console.ReadLine();
}
static float NextFloat(Random random)
{
double mantissa = (random.NextDouble() * 2.0) - 1.0;
double exponent = Math.Pow(2.0, random.Next(-126, 128));
return (float)(mantissa * exponent);
}
}
class SortTest : IComparable<SortTest>
{
public float SomeFloat;
public int ID;
public SortTest(float f, int id)
{
SomeFloat = f;
ID = id;
}
public int CompareTo(SortTest other)
{
float f = other.SomeFloat;
if (SomeFloat < f)
return -1;
else if (SomeFloat > f)
return 1;
else
return 0;
}
}
struct SortStruct : IComparable<SortStruct>
{
public float SomeFloat;
public int ID;
public SortStruct(float f, int id)
{
SomeFloat = f;
ID = id;
}
public int CompareTo(SortStruct other)
{
float f = other.SomeFloat;
if (SomeFloat < f)
return -1;
else if (SomeFloat > f)
return 1;
else
return 0;
}
}
}
Demo output:
Sorting random arrays, this will take some time...
normalFloatAverage: 3840,998 ticks.
floatWithIDAverage: 12850.672 ticks.
Press enter to exit.
Edit: I updated the code to also sort using a struct and a delegate, as suggested below, there is no difference.
New demo output:
Sorting random arrays, this will take some time...
normalFloatAverage: 3629.092 ticks.
floatWithIDAverage: 12721.622 ticks.
structAverage: 12870.584 ticks.
Press enter to exit.
Edit 2: I have updated my code with some of the suggestions below, making it a struct either has no effect on my pc or I am doing something horribly wrong. I also added some sanity checks.
New demo output (don't let the "atleast once" fool you, it is misphrased):
Sorting random arrays, this will take some time...
normalFloatAverage: 3679.928 ticks.
floatWithIDAverage: 14084.794 ticks.
structAverage: 11725.364 ticks.
arraySortOverloadAverage: 2186.3 ticks.
floatWithIDArray has been sorted correctly atleast once.
structArray has been sorted correctly atleast once.
arraySortOverloadArray has NOT been sorted correctly atleast once.
Press enter to exit.
Edit 3: I have updated the demo code once again with a fix to the overloaded method of Array.Sort. Note that I create and fill test[] outside the Stopwatch because in my case I already have that array available. arraySortOverload is faster in debug mode, and about as fast as the struct method in release mode.
New demo output (RELEASE):
Sorting random arrays, this will take some time...
normalFloatAverage: 2384.578 ticks.
floatWithIDAverage: 6405.866 ticks.
structAverage: 4583.992 ticks.
arraySortOverloadAverage: 4551.104 ticks.
floatWithIDArray has been sorted correctly all the time.
structArray has been sorted correctly all the time.
arraySortOverloadArray has been sorted correctly all the time.
Press enter to exit.

There are two minor ways to speed this up:
Use a struct instead of a class.
Hand-code the CompareTo() instead of using float.CompareTo().
There is also a major way to speed this up for floatWithIDAverage: Use x86 instead of x64. (But this does NOT speed up normalFloatAverage!)
Results before making any changes (for a RELEASE build, not a DEBUG build):
x64:
normalFloatAverage: 2469.86 ticks.
floatWithIDAverage: 6172.564 ticks.
x86:
normalFloatAverage: 3071.544 ticks.
floatWithIDAverage: 6036.546 ticks.
Results after changing SortTest to be a struct:
This change allows the compiler to make a number of optimizations.
x64:
normalFloatAverage: 2307.552 ticks.
floatWithIDAverage: 4214.414 ticks.
x86:
normalFloatAverage: 3054.814 ticks.
floatWithIDAverage: 4541.864 ticks.
Results after changing SortTest.CompareTo() as follows:
public int CompareTo(SortTest other)
{
float f = other.SomeFloat;
if (SomeFloat < f)
return -1;
else if (SomeFloat > f)
return 1;
else
return 0;
}
This change removes the overhead of calling float.CompareTo().
x64:
normalFloatAverage: 2323.834 ticks.
floatWithIDAverage: 3721.254 ticks.
x86:
normalFloatAverage: 3087.314 ticks.
floatWithIDAverage: 3074.364 ticks.
Finally, in this specific case, floatWithIDAverage is actually faster than normalFloatAverage.
The difference between x86 and x64 is interesting!
x64 is faster than x86 for normalFloatAverage
x86 is faster than x64 for floatWithIDAverage
Conclusion
Although I can't explain why the x86 version is so much faster than the x64 version for floatWithIDAverage, I have shown a way of speeding it up significantly.

I'll complement the other answers by adding another way to optimize this. Using a struct certainly is essential. It removes lots of pointer following and makes the JIT generate specialized generics code just for this struct.
The JIT is unfortunately sometimes not able to completely remove all generics overhead even when you use a struct. For that reason it can be beneficial to fork the .NET Array.Sort code and hard-code your array item type. This is a brutal thing to do. It's only worth it if your performance requirements are high. I have done this once and it payed off.

C# Crashing after modifying static variable during benchmarking

I have implemented B-Tree and now I am trying to find the best size per node. I am using time benchmarking to measure the speed.
The problem is that it crashes on the second tested number in benchmarking method.
For the example down the output in console is
Benchmarking 10
Benchmarking 11
The crash is in insert method of Node class, but it does not matter because when I tested using any number as SIZE it works well. Maybe I do not understand how the class objects are created or something like that.
I also tried calling benchmarking method with variuos numbers from main, but same result, during second call it crashed.
Can somebody please look at it and explain me what am I doing wrong? Thanks in advance!
I put part of the code here and here is the whole thing http://pastebin.com/AcihW1Qk
public static void benchmark(String filename, int d, int h)
{
using (System.IO.StreamWriter file = new System.IO.StreamWriter(filename))
{
Stopwatch sw = new Stopwatch();
for(int i = d; i <= h; i++)
{
Console.WriteLine("Benchmarking SIZE = " + i.ToString());
file.WriteLine("SIZE = " + i.ToString());
sw.Start();
// code here
Node.setSize(i);
Tree tree = new Tree();
for (int k = 0; k < 10000000; k++)
tree.insert(k);
Random r = new Random(10);
for (int k = 0; k < 10000; k++)
{
int x = r.Next(10000000);
tree.contains(x);
}
file.WriteLine("Depth of tree is " + tree.depth.ToString());
// end of code
sw.Stop();
file.WriteLine("TIME = " + sw.ElapsedMilliseconds.ToString());
file.WriteLine();
sw.Reset();
}
}
static void Main(string[] args)
{
benchmark("benchmark 10-11.txt", 10,11);
}

TPL doesn’t appear to improve execution speed in MSDN example

When I use the example from MSDN:
var queryA = from num in numberList.AsParallel()
select ExpensiveFunction(num); //good for PLINQ
var queryB = from num in numberList.AsParallel()
where num % 2 > 0
select num; //not as good for PLINQ
My example program:
static void Main(string[] args)
{
// ThreadPool.SetMinThreads(100, 100);
var numberList = new List<int>();
for (int i = 0; i <= 1000; i++)
{
numberList.Add(i);
}
Stopwatch sw = new Stopwatch();
sw.Start();
var queryA = from num in numberList
select ExpensiveFunction(num); //good for PLINQ
var c = queryA.ToList<int>();
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
var queryB = from num in numberList.AsParallel()
select ExpensiveFunction(num); //good for PLINQ
c = queryB.ToList<int>();
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
Console.ReadKey();
}
static int ExpensiveFunction(int a)
{
a = a + 100 - 9 + 0 + 98;
// Console.WriteLine(a);
return a;
}
The result is:
7
41
Why is using AsParallel() slower than not using it?

Your ExpensiveFunction really isn't an expensive function for a computer.
Simple maths can be done extremely fast.
Perhaps try Thread.Sleep(500); instead. This will tell the CPU to pause for half a second which will simulate the effect of an actual expensive function.
Edit — I should state, the reason it is slower is because of the overhead parallel processing involves is more work than the actual calculation. See this answer for a better explanation.

Exit the loop after specific time in C#

I have a requirement in my project (C#, VS2010, .NET 4.0) that a particular for loop must finish within 200 milliseconds. If it doesn't then it has to terminate after this duration without executing the remaining iterations. The loop generally goes for i = 0 to about 500,000 to 700,000 so the total loop time varies.
I have read following questions which are similar but they didn't help in my case:
What is the best way to exit out of a loop after an elapsed time of 30ms in C++
How to execute the loop for specific time
So far I have tried using a Stopwatch object to track the elapsed time but it's not working for me. Here are 2 different methods I have tried so far:
Method 1. Comparing the elapsed time within for loop:
Stopwatch sw = new Stopwatch();
sw.Start();
for (i = 0; i < nEntries; i++) // nEntries is typically more than 500,000
{
// Do some stuff
...
...
...
if (sw.Elapsed > TimeSpan.FromMilliseconds(200))
break;
}
sw.Stop();
This doesn't work because if (sw.Elapsed > TimeSpan.FromMilliseconds(200)) takes more than 200 milliseconds to complete. Hence useless in my case. I am not sure whether TimeSpan.FromMilliseconds() generally takes this long or it's just in my case for some reason.
Method 2. Creating a separate thread to compare time:
Stopwatch sw = new Stopwatch();
sw.Start();
bool bDoExit = false;
int msLimit = 200;
System.Threading.ThreadPool.QueueUserWorkItem((x) =>
{
while (bDoExit == false)
{
if (sw.Elapsed.Milliseconds > msLimit)
{
bDoExit = true;
sw.Stop();
}
System.Threading.Thread.Sleep(10);
}
});
for (i = 0; i < nEntries; i++) // nEntries is typically more than 500,000
{
// Do some stuff
...
...
...
if (bDoExit == true)
break;
}
sw.Stop();
I have some other code in the for loop that prints some statistics. It tells me that in case of Method 2, the for loop definitely breaks before completing all the iterations but the loop timing is still 280-300 milliseconds.
Any suggestions to break a for loop strictly with-in 200 milliseconds or less?
Thanks.

For a faster comparison try comparing
if(sw.ElapsedMilliseconds > 200)
break;
You should do that check in the beggining of your loop and also during the processing, ("// Do some stuff" part of the code) because it is possible, for example, that processing starts at 190 (beginning of the loop), lasts 20 and ends at 210.
You could also measure average execution time of your processing (this is approximate because it relies on average time), this way loop should last 200 milliseconds or less, here is a demo that you can put in a Main method of a Console application and easily modify it for your application:
Stopwatch sw = new Stopwatch();
sw.Start();
string a = String.Empty;
int i;
decimal sum = 0, avg = 0, beginning = 0, end = 0;
for (i = 0; i < 700000; i++) // nEntries is typically more than 500,000
{
beginning = sw.ElapsedMilliseconds;
if (sw.ElapsedMilliseconds + avg > 200)
break;
// Some processing
a += "x";
int s = a.Length * 100;
Thread.Sleep(19);
/////////////
end = sw.ElapsedMilliseconds;
sum += end - beginning;
avg = sum / (i + 1);
}
sw.Stop();
Console.WriteLine(
"avg:{0}, count:{1}, milliseconds elapsed:{2}", avg, i + 1,
sw.ElapsedMilliseconds);
Console.ReadKey();

Another option would be to use CancellationTokenSource:
CancellationTokenSource source = new CancellationTokenSource(100);
while(!source.IsCancellationRequested)
{
// Do stuff
}

Use the first one - simple and have better chances to be precise than second one.
Both cases have the same kind of termination condition, so both should behave are more-or-less the same. Second is much more complicated due to usage of threads and Sleep, so I'd use first one. Also second one is much less precise due to sleeps.
There are abolutely no reasons for TimeSpan.FromMilliseconds(200) to take any significant amount of time (as well as calling it in every iteration).

Using cancellation token:
var cancellationToken = new CancellationTokenSource(TimeSpan.FromSeconds(15)).Token;
while (!cancellationToken.IsCancellationRequested)
{
//Do stuff...
}

I don't know if this is that exactly, but I think it's worth a try using a System.Timers.Timer:
int msLimit = 200;
int nEntries = 500000;
bool cancel = false;
System.Timers.Timer t = new System.Timers.Timer();
t.Interval = msLimit;
t.Elapsed += (s, e) => cancel = true;
t.Start();
for (int i = 0; i < nEntries; i++)
{
// do sth
if (cancel) {
break;
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C#: Why is a function call faster than manual inlining? - c#

Your test is wrong. In the second part you use a int d instead of a double. Maybe it explains the time difference.

Related

C# vs C++ for loop performance measurment

Array.Sort() performance drop when sorting class instances instead of floats

C# Crashing after modifying static variable during benchmarking

TPL doesn’t appear to improve execution speed in MSDN example

Exit the loop after specific time in C#

Categories

Resources