Math.Max vs inline if - what are the differences? - c#

I was working on a project today, and found myself using Math.Max in several places and inline if statements in other places. So, I was wondering if anybody knew which is "better"... or rather, what the real differences are.
For example, in the following, c1 = c2:
Random rand = new Random();
int a = rand.next(0,10000);
int b = rand.next(0,10000);
int c1 = Math.Max(a, b);
int c2 = a>b ? a : b;
I'm asking specifically about C#, but I suppose the answer could be different in different languages, though I'm not sure which ones have similar concepts.

One of the major differences I would notice right away would be for readability sake, as far as I know for implementation/performance sake, they would be nearly equivalent.
Math.Max(a,b) is very simple to understand, regardless of previous coding knowledge.
a>b ? a : b would require the user to have some knowledge of the ternary operator, at least.
"When in doubt - go for readability"

I thought it would be fun to throw in some numbers into this discussion so I wrote some code to profile it. As expected they are almost identical for all practical purposes.
The code does a billion loops (yep 1 billion). Subtracting the overhead of the loop you get:
Math.Max() took .0044 seconds to run 1 billion times
The inline if took .0055 seconds to run 1 billion times
I subtracted the overhead which I calculated by running an empty loop 1 billion times, the overhead was 1.2 seconds.
I ran this on a laptop, 64-bit Windows 7, 1.3 Ghz Intel Core i5 (U470). The code was compiled in release mode and ran without a debugger attached.
Here's the code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
namespace TestMathMax {
class Program {
static int Main(string[] args) {
var num1 = 10;
var num2 = 100;
var maxValue = 0;
var LoopCount = 1000000000;
double controlTotalSeconds;
{
var stopwatch = new Stopwatch();
stopwatch.Start();
for (var i = 0; i < LoopCount; i++) {
// do nothing
}
stopwatch.Stop();
controlTotalSeconds = stopwatch.Elapsed.TotalSeconds;
Console.WriteLine("Control - Empty Loop - " + controlTotalSeconds + " seconds");
}
Console.WriteLine();
{
var stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < LoopCount; i++) {
maxValue = Math.Max(num1, num2);
}
stopwatch.Stop();
Console.WriteLine("Math.Max() - " + stopwatch.Elapsed.TotalSeconds + " seconds");
Console.WriteLine("Relative: " + (stopwatch.Elapsed.TotalSeconds - controlTotalSeconds) + " seconds");
}
Console.WriteLine();
{
var stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < LoopCount; i++) {
maxValue = num1 > num2 ? num1 : num2;
}
stopwatch.Stop();
Console.WriteLine("Inline Max: " + stopwatch.Elapsed.TotalSeconds + " seconds");
Console.WriteLine("Relative: " + (stopwatch.Elapsed.TotalSeconds - controlTotalSeconds) + " seconds");
}
Console.ReadLine();
return maxValue;
}
}
}
UPDATED Results 2/7/2015
On a Windows 8.1, Surface 3 Pro, i7 4650U 2.3Ghz
Ran as a console application in release mode without the debugger attached.
Math.Max() - 0.3194749 seconds
Inline Max: 0.3465041 seconds

if statement considered beneficial
Summary
a statement of the form if (a > max) max = a is the fastest way to determine the maximum of a set of numbers. However the loop infrastructure itself takes most of the CPU time, so this optimization is questionable in the end.
Details
The answer by luisperezphd is interesting because it provides numbers, however I believe the method is flawed: the compiler will most likely move the comparison out of the loop, so the answer doesn't measure what it wants to measure. This explains the negligible timing difference between control loop and measurement loops.
To avoid this loop optimization, I added an operation that depends on the loop variable, to the empty control loop as well as to all measurement loops. I simulate the common use case of finding the maximum in a list of numbers, and used three data sets:
best case: the first number is the maximum, all numbers after it are smaller
worst case: every number is bigger than the previous, so the max changes each iteration
average case: a set of random numbers
See below for the code.
The result was rather surprising to me. On my Core i5 2520M laptop I got the following for 1 billion iterations (the empty control took about 2.6 sec in all cases):
max = Math.Max(max, a): 2.0 sec best case / 1.3 sec worst case / 2.0 sec average case
max = Math.Max(a, max): 1.6 sec best case / 2.0 sec worst case / 1.5 sec average case
max = max > a ? max : a: 1.2 sec best case / 1.2 sec worst case / 1.2 sec average case
if (a > max) max = a: 0.2 sec best case / 0.9 sec worst case / 0.3 sec average case
So despite long CPU pipelines and the resulting penalties for branching, the good old if statement is the clear winner for all simulated data sets; in the best case it is 10 times faster than Math.Max, and in the worst case still more than 30% faster.
Another surprise is that the order of the arguments to Math.Max matters. Presumably this is because of CPU branch prediction logic working differently for the two cases, and mispredicting branches more or less depending on the order of arguments.
However, the majority of the CPU time is spent in the loop infrastructure, so in the end this optimization is questionable at best. It provides a measurable but minor reduction in overall execution time.
UPDATED by luisperezphd
I couldn't fit this as a comment and it made more sense to write it here instead of as part of my answer so that it was in context.
Your theory makes sense, but I was not able to reproduce the results. First for some reason using your code my control loop was taking longer than the loops containing work.
For that reason I made the numbers here relative to the lowest time instead of the control loop. The seconds in the results are how much longer it took than the fastest time. For example in the results immediately below the fastest time was for Math.Max(a, max) best case, so every other result represents how much longer they took than that.
Below are the results I got:
max = Math.Max(max, a): 0.012 sec best case / 0.007 sec worst case / 0.028 sec average case
max = Math.Max(a, max): 0.000 best case / 0.021 worst case / 0.019 sec average case
max = max > a ? max : a: 0.022 sec best case / 0.02 sec worst case / 0.01 sec average case
if (a > max) max = a: 0.015 sec best case / 0.024 sec worst case / 0.019 sec average case
The second time I ran it I got:
max = Math.Max(max, a): 0.024 sec best case / 0.010 sec worst case / 0.009 sec average case
max = Math.Max(a, max): 0.001 sec best case / 0.000 sec worst case / 0.018 sec average case
max = max > a ? max : a: 0.011 sec best case / 0.005 sec worst case / 0.018 sec average case
if (a > max) max = a: 0.000 sec best case / 0.005 sec worst case / 0.039 sec average case
There is enough volume in these tests that any anomalies should have been wiped out. Yet despite that the results are pretty different. Maybe the large memory allocation for the array has something to do with it. Or possibly the difference is so small that anything else happening on the computer at the time is the true cause of the variation.
Note the fastest time, represented in the results above by 0.000 is about 8 seconds. So if you consider that the longest run then was 8.039, the variation in time is about half a percent (0.5%) - aka too small to matter.
The computer
The code was ran on Windows 8.1, i7 4810MQ 2.8Ghz and compiled in .NET 4.0.
Code modifications
I modified your code a bit to output the results in the format shown above. I also added additional code to wait 1 second after starting to account for any additional loading time .NET might need when running the assembly.
Also I ran all the tests twice to account for any CPU optimizations. Finally I changed the int for i to a unit so I could run the loop 4 billion times instead of 1 billion to get a longer timespan.
That's probably all overkill, but it's all to make sure as much as possible that the tests are not affected by any of those factors.
You can find the code at: http://pastebin.com/84qi2cbD
Code
using System;
using System.Diagnostics;
namespace ProfileMathMax
{
class Program
{
static double controlTotalSeconds;
const int InnerLoopCount = 100000;
const int OuterLoopCount = 1000000000 / InnerLoopCount;
static int[] values = new int[InnerLoopCount];
static int total = 0;
static void ProfileBase()
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
int maxValue;
for (int j = 0; j < OuterLoopCount; j++)
{
maxValue = 0;
for (int i = 0; i < InnerLoopCount; i++)
{
// baseline
total += values[i];
}
}
stopwatch.Stop();
controlTotalSeconds = stopwatch.Elapsed.TotalSeconds;
Console.WriteLine("Control - Empty Loop - " + controlTotalSeconds + " seconds");
}
static void ProfileMathMax()
{
int maxValue;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int j = 0; j < OuterLoopCount; j++)
{
maxValue = 0;
for (int i = 0; i < InnerLoopCount; i++)
{
maxValue = Math.Max(values[i], maxValue);
total += values[i];
}
}
stopwatch.Stop();
Console.WriteLine("Math.Max(a, max) - " + stopwatch.Elapsed.TotalSeconds + " seconds");
Console.WriteLine("Relative: " + (stopwatch.Elapsed.TotalSeconds - controlTotalSeconds) + " seconds");
}
static void ProfileMathMaxReverse()
{
int maxValue;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int j = 0; j < OuterLoopCount; j++)
{
maxValue = 0;
for (int i = 0; i < InnerLoopCount; i++)
{
maxValue = Math.Max(maxValue, values[i]);
total += values[i];
}
}
stopwatch.Stop();
Console.WriteLine("Math.Max(max, a) - " + stopwatch.Elapsed.TotalSeconds + " seconds");
Console.WriteLine("Relative: " + (stopwatch.Elapsed.TotalSeconds - controlTotalSeconds) + " seconds");
}
static void ProfileInline()
{
int maxValue = 0;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int j = 0; j < OuterLoopCount; j++)
{
maxValue = 0;
for (int i = 0; i < InnerLoopCount; i++)
{
maxValue = maxValue > values[i] ? values[i] : maxValue;
total += values[i];
}
}
stopwatch.Stop();
Console.WriteLine("max = max > a ? a : max: " + stopwatch.Elapsed.TotalSeconds + " seconds");
Console.WriteLine("Relative: " + (stopwatch.Elapsed.TotalSeconds - controlTotalSeconds) + " seconds");
}
static void ProfileIf()
{
int maxValue = 0;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int j = 0; j < OuterLoopCount; j++)
{
maxValue = 0;
for (int i = 0; i < InnerLoopCount; i++)
{
if (values[i] > maxValue)
maxValue = values[i];
total += values[i];
}
}
stopwatch.Stop();
Console.WriteLine("if (a > max) max = a: " + stopwatch.Elapsed.TotalSeconds + " seconds");
Console.WriteLine("Relative: " + (stopwatch.Elapsed.TotalSeconds - controlTotalSeconds) + " seconds");
}
static void Main(string[] args)
{
Random rnd = new Random();
for (int i = 0; i < InnerLoopCount; i++)
{
//values[i] = i; // worst case: every new number biggest than the previous
//values[i] = i == 0 ? 1 : 0; // best case: first number is the maximum
values[i] = rnd.Next(int.MaxValue); // average case: random numbers
}
ProfileBase();
Console.WriteLine();
ProfileMathMax();
Console.WriteLine();
ProfileMathMaxReverse();
Console.WriteLine();
ProfileInline();
Console.WriteLine();
ProfileIf();
Console.ReadLine();
}
}
}

I'd say it is quicker to understand what Math.Max is doing, and that should really be the only deciding factor here.
But as an indulgence, it's interesting to consider that Math.Max(a,b) evaluates the arguments once, whilst a > b ? a : b evaluates one of them twice. Not a problem with local variables, but for properties with side effects, the side effect may happen twice.

If the JITer chooses to inline the Math.Max function, the executable code will be identical to the if statement. If Math.Max isn't inlined, it will execute as a function call with call and return overhead not present in the if statement. So, the if statement will give identical performance to Math.Max() in the inlining case or the if statement may be a few clock cycles faster in the non-inlined case, but the difference won't be noticeable unless you are running tens of millions of comparisons.
Since the performance difference between the two is small enough to be negligible in most situations, I'd prefer the Math.Max(a,b) because it's easier to read.

Regarding performance, Modern CPUs have internal command pipeline such that every assembly command is executed in several internal steps. (e.g. fetching, interpretation, calculation, storage)
In most cases the CPU is smart enough to run these steps in parallel for sequential commands so the overall throughput is very high.
This is fine till there comes a branch (if, ?: etc.) .
The branch may break the sequence and force the CPU to trash the pipeline.
This costs a lot of clock cycles.
In theory, if the compiler is smart enough, the Math.Max can be implemented using a built it CPU command and the branching can be avoided.
In this case the Math.Max would actually be faster than the if - but it depends on the compiler..
In case of more complicated Max - like working on vectors, double []v; v.Max() the compiler can utilize highly optimized library code, that can be much faster than regular compiled code.
So it's best to go with Math.Max, but it is also recommended to check on your particular target system and compiler if it is important enough.

Math.Max(a,b)
is NOT equivalent to a > b ? a : b in all cases.
Math.Max returns the greater value of the two arguments, that is:
if (a == b) return a; // or b, doesn't matter since they're identical
else if (a > b && b < a) return a;
else if (b > a && a < b) return b;
else return undefined;
The undefined is mapped to double.NaN in case of the double overload of Math.Max for example.
a > b ? a : b
evaluates to a if a is greater than b, which does not necessarily mean that b is less than a.
A simple example that demonstrates that they are not equivalent:
var a = 0.0/0.0; // or double.NaN
var b = 1.0;
a > b ? a : b // evaluates to 1.0
Math.Max(a, b) // returns double.NaN

Take an operation;
N must be >= 0
General solutions:
A) N = Math.Max(0, N)
B) if(N < 0){N = 0}
Sorting by speed:
Slow: Math.Max (A) < (B) if-then statement :Fast (3% more faster than solution 'A')
But my solution is 4% faster than solution 'B':
N *= Math.Sign(1 + Math.Sign(N));

Related

Parallel.ForEach slower than normal foreach

I'm playing around with the Parallel.ForEach in a C# console application, but can't seem to get it right. I'm creating an array with random numbers and i have a sequential foreach and a Parallel.ForEach that finds the largest value in the array. With approximately the same code in c++ i started to see a tradeoff to using several threads at 3M values in the array. But the Parallel.ForEach is twice as slow even at 100M values. What am i doing wrong?
class Program
{
static void Main(string[] args)
{
dostuff();
}
static void dostuff() {
Console.WriteLine("How large do you want the array to be?");
int size = int.Parse(Console.ReadLine());
int[] arr = new int[size];
Random rand = new Random();
for (int i = 0; i < size; i++)
{
arr[i] = rand.Next(0, int.MaxValue);
}
var watchSeq = System.Diagnostics.Stopwatch.StartNew();
var largestSeq = FindLargestSequentially(arr);
watchSeq.Stop();
var elapsedSeq = watchSeq.ElapsedMilliseconds;
Console.WriteLine("Finished sequential in: " + elapsedSeq + "ms. Largest = " + largestSeq);
var watchPar = System.Diagnostics.Stopwatch.StartNew();
var largestPar = FindLargestParallel(arr);
watchPar.Stop();
var elapsedPar = watchPar.ElapsedMilliseconds;
Console.WriteLine("Finished parallel in: " + elapsedPar + "ms Largest = " + largestPar);
dostuff();
}
static int FindLargestSequentially(int[] arr) {
int largest = arr[0];
foreach (int i in arr) {
if (largest < i) {
largest = i;
}
}
return largest;
}
static int FindLargestParallel(int[] arr) {
int largest = arr[0];
Parallel.ForEach<int, int>(arr, () => 0, (i, loop, subtotal) =>
{
if (i > subtotal)
subtotal = i;
return subtotal;
},
(finalResult) => {
Console.WriteLine("Thread finished with result: " + finalResult);
if (largest < finalResult) largest = finalResult;
}
);
return largest;
}
}
It's performance ramifications of having a very small delegate body.
We can achieve better performance using the partitioning. In this case the body delegate performs work with a high data volume.
static int FindLargestParallelRange(int[] arr)
{
object locker = new object();
int largest = arr[0];
Parallel.ForEach(Partitioner.Create(0, arr.Length), () => arr[0], (range, loop, subtotal) =>
{
for (int i = range.Item1; i < range.Item2; i++)
if (arr[i] > subtotal)
subtotal = arr[i];
return subtotal;
},
(finalResult) =>
{
lock (locker)
if (largest < finalResult)
largest = finalResult;
});
return largest;
}
Pay attention to synchronize the localFinally delegate. Also note the need for proper initialization of the localInit: () => arr[0] instead of () => 0.
Partitioning with PLINQ:
static int FindLargestPlinqRange(int[] arr)
{
return Partitioner.Create(0, arr.Length)
.AsParallel()
.Select(range =>
{
int largest = arr[0];
for (int i = range.Item1; i < range.Item2; i++)
if (arr[i] > largest)
largest = arr[i];
return largest;
})
.Max();
}
I highly recommend free book Patterns of Parallel Programming by Stephen Toub.
As the other answerers have mentioned, the action you're trying to perform against each item here is so insignificant that there are a variety of other factors which end up carrying more weight than the actual work you're doing. These may include:
JIT optimizations
CPU branch prediction
I/O (outputting thread results while the timer is running)
the cost of invoking delegates
the cost of task management
the system incorrectly guessing what thread strategy will be optimal
memory/cpu caching
memory pressure
environment (debugging)
etc.
Running each approach a single time is not an adequate way to test, because it enables a number of the above factors to weigh more heavily on one iteration than on another. You should start with a more robust benchmarking strategy.
Furthermore, your implementation is actually dangerously incorrect. The documentation specifically says:
The localFinally delegate is invoked once per task to perform a final action on each task’s local state. This delegate might be invoked concurrently on multiple tasks; therefore, you must synchronize access to any shared variables.
You have not synchronized your final delegate, so your function is prone to race conditions that would make it produce incorrect results.
As in most cases, the best approach to this one is to take advantage of work done by people smarter than we are. In my testing, the following approach appears to be the fastest overall:
return arr.AsParallel().Max();
The Parallel Foreach loop should be running slower because the algorithm used is not parallel and a lot more work is being done to run this algorithm.
In the single thread, to find the max value, we can take the first number as our max value and compare it to every other number in the array. If one of the numbers larger than our first number, we swap and continue. This way we access each number in the array once, for a total of N comparisons.
In the Parallel loop above, the algorithm creates overhead because each operation is wrapped inside a function call with a return value. So in addition to doing the comparisons, it is running overhead of adding and removing these calls onto the call stack. In addition, since each call is dependent on the value of the function call before, it needs to run in sequence.
In the Parallel For Loop below, the array is divided into an explicit number of threads determined by the variable threadNumber. This limits the overhead of function calls to a low number.
Note, for low values, the parallel loops performs slower. However, for 100M, there is a decrease in time elapsed.
static int FindLargestParallel(int[] arr)
{
var answers = new ConcurrentBag<int>();
int threadNumber = 4;
int partitionSize = arr.Length/threadNumber;
Parallel.For(0, /* starting number */
threadNumber+1, /* Adding 1 to threadNumber in case array.Length not evenly divisible by threadNumber */
i =>
{
if (i*partitionSize < arr.Length) /* check in case # in array is divisible by # threads */
{
var max = arr[i*partitionSize];
for (var x = i*partitionSize;
x < (i + 1)*partitionSize && x < arr.Length;
++x)
{
if (arr[x] > max)
max = arr[x];
}
answers.Add(max);
}
});
/* note the shortcut in finding max in the bag */
return answers.Max(i=>i);
}
Some thoughts here: In the parallel case, there is thread management logic involved that determines how many threads it wants to use. This thread management logic presumably possibly runs on your main thread. Every time a thread returns with the new maximum value, the management logic kicks in and determines the next work item (the next number to process in your array). I'm pretty sure that this requires some kind of locking. In any case, determining the next item may even cost more than performing the comparison operation itself.
That sounds like a magnitude more work (overhead) to me than a single thread that processes one number after the other. In the single-threaded case there are a number of optimization at play: No boundary checks, CPU can load data into the first level cache within the CPU, etc. Not sure, which of these optimizations apply for the parallel case.
Keep in mind that on a typical desktop machine there are only 2 to 4 physical CPU cores available so you will never have more than that actually doing work. So if the parallel processing overhead is more than 2-4 times of a single-threaded operation, the parallel version will inevitably be slower, which you are observing.
Have you attempted to run this on a 32 core machine? ;-)
A better solution would be determine non-overlapping ranges (start + stop index) covering the entire array and let each parallel task process one range. This way, each parallel task can internally do a tight single-threaded loop and only return once the entire range has been processed. You could probably even determine a near optimal number of ranges based on the number of logical cores of the machine. I haven't tried this but I'm pretty sure you will see an improvement over the single-threaded case.
Try splitting the set into batches and running the batches in parallel, where the number of batches corresponds to your number of CPU cores.
I ran some equations 1K, 10K and 1M times using the following methods:
A "for" loop.
A "Parallel.For" from the System.Threading.Tasks lib, across the entire set.
A "Parallel.For" across 4 batches.
A "Parallel.ForEach" from the System.Threading.Tasks lib, across the entire set.
A "Parallel.ForEach" across 4 batches.
Results: (Measured in seconds)
Conclusion:
Processing batches in parallel using the "Parallel.ForEach" has the best outcome in cases above 10K records. I believe the batching helps because it utilizes all CPU cores (4 in this example), but also minimizes the amount of threading overhead associated with parallelization.
Here is my code:
public void ParallelSpeedTest()
{
var rnd = new Random(56);
int range = 1000000;
int numberOfCores = 4;
int batchSize = range / numberOfCores;
int[] rangeIndexes = Enumerable.Range(0, range).ToArray();
double[] inputs = rangeIndexes.Select(n => rnd.NextDouble()).ToArray();
double[] weights = rangeIndexes.Select(n => rnd.NextDouble()).ToArray();
double[] outputs = new double[rangeIndexes.Length];
/// Series "for"...
var startTimeSeries = DateTime.Now;
for (var i = 0; i < range; i++)
{
outputs[i] = Math.Sqrt(Math.Pow(inputs[i] * weights[i], 2));
}
var durationSeries = DateTime.Now - startTimeSeries;
/// "Parallel.For"...
var startTimeParallel = DateTime.Now;
Parallel.For(0, range, (i) => {
outputs[i] = Math.Sqrt(Math.Pow(inputs[i] * weights[i], 2));
});
var durationParallelFor = DateTime.Now - startTimeParallel;
/// "Parallel.For" in Batches...
var startTimeParallel2 = DateTime.Now;
Parallel.For(0, numberOfCores, (c) => {
var endValue = (c == numberOfCores - 1) ? range : (c + 1) * batchSize;
var startValue = c * batchSize;
for (var i = startValue; i < endValue; i++)
{
outputs[i] = Math.Sqrt(Math.Pow(inputs[i] * weights[i], 2));
}
});
var durationParallelForBatches = DateTime.Now - startTimeParallel2;
/// "Parallel.ForEach"...
var startTimeParallelForEach = DateTime.Now;
Parallel.ForEach(rangeIndexes, (i) => {
outputs[i] = Math.Sqrt(Math.Pow(inputs[i] * weights[i], 2));
});
var durationParallelForEach = DateTime.Now - startTimeParallelForEach;
/// Parallel.ForEach in Batches...
List<Tuple<int,int>> ranges = new List<Tuple<int, int>>();
for (var i = 0; i < numberOfCores; i++)
{
int start = i * batchSize;
int end = (i == numberOfCores - 1) ? range : (i + 1) * batchSize;
ranges.Add(new Tuple<int,int>(start, end));
}
var startTimeParallelBatches = DateTime.Now;
Parallel.ForEach(ranges, (range) => {
for(var i = range.Item1; i < range.Item1; i++) {
outputs[i] = Math.Sqrt(Math.Pow(inputs[i] * weights[i], 2));
}
});
var durationParallelForEachBatches = DateTime.Now - startTimeParallelBatches;
Debug.Print($"=================================================================");
Debug.Print($"Given: Set-size: {range}, number-of-batches: {numberOfCores}, batch-size: {batchSize}");
Debug.Print($".................................................................");
Debug.Print($"Series For: {durationSeries}");
Debug.Print($"Parallel For: {durationParallelFor}");
Debug.Print($"Parallel For Batches: {durationParallelForBatches}");
Debug.Print($"Parallel ForEach: {durationParallelForEach}");
Debug.Print($"Parallel ForEach Batches: {durationParallelForEachBatches}");
Debug.Print($"");
}

What is the more efficient syntax to produce an even number in my console project?

I recently started learning c# and learned that there are 2 ways to have my code output even numbers when I write this For loop. I learned the syntax of version 2, which is intuitive to me. However, version 1 was an exampled I found elsewhere online. I want to understand if there is a difference between the two versions.
//Version 1
for (int num = 0; num < 100; num+=2)
{
Console.WriteLine (num);
}
//Version 2
for (int num = 0; num < 100; num++)
{
if (num % 2 == 0)
{
Console.WriteLine (num);
}
}
Of the two possible ways, is there any difference between the two syntaxes? If yes, which is more efficient and why?
For a given value of N (in your sample code, N=100)
Version #1 does N/2 additions
Version #2 does does N additions, plus N integer divisions (a relatively expensive operation).
And you forgot Version 3, bit-twiddling. Bitwise operations are a whole lot cheaper than division, and since we know that integers in the C# world are two's-complement binary values, the state of the low-order bit tells us whether an integer is even or not, thus:
bool isEven( int n ) { return 0 == ( n & 1 ) ; }
We can write a test harness, like this:
class Program
{
public static int Version1(int n)
{
int cnt = 0;
for ( int i = 0 ; i < n ; i+=2 )
{
++cnt;
}
return cnt;
}
public static int Version2(int n)
{
int cnt = 0;
for ( int i = 0 ; i < n ; ++i )
{
if ( i % 2 == 0 )
{
++cnt;
}
}
return cnt;
}
public static int Version3(int n)
{
int cnt = 0;
for ( int i = 0 ; i < n ; ++i )
{
if ( 0 == (i & 1) )
{
++cnt;
}
}
return cnt;
}
private static void Main(string[] args)
{
int n = 500000000;
Stopwatch timer = new Stopwatch();
timer.Start();
Version1( n );
timer.Stop();
Console.WriteLine( "{0:c} -- Version #1 (incrementing by 2)" , timer.Elapsed ) ;
timer.Restart();
Version2(n);
timer.Stop();
Console.WriteLine( "{0:c} -- Version #2 (incrementing by 1 with modulo test)" , timer.Elapsed ) ;
timer.Restart();
Version3(n);
timer.Stop();
Console.WriteLine( "{0:c} -- Version #3 (incrementing by 1 with bit twiddling)" , timer.Elapsed ) ;
return;
}
}
And find out which is actually faster. The above program runs the loop 500,000,000 times so we get numbers that are big enough to measure.
Here are the timings I get with VS 2013:
Debug build (non-optimized):
00:00:00.5500486 -- Version #1 (incrementing by 2)
00:00:02.0843776 -- Version #2 (incrementing by 1 with modulo test)
00:00:01.2507272 -- Version #3 (incrementing by 1 with bit twiddling)
Version #2 is 3.789 times slower than version #1
Version #3 is 2.274 times slower than version #1
Release build (optimized)
00:00:00.1680107 -- Version #1 (incrementing by 2)
00:00:00.5109271 -- Version #2 (incrementing by 1 with modulo test)
00:00:00.3367961 -- Version #3 (incrementing by 1 with bit twiddling)
Version #2 is 3.041 times slower than version #1
Version #3 is 2.005 times slower than version #1
You can exactly measure which method is faster, on a precision of 10000th of a millisecond, also known as "Tick" in the .Net framework.
This is done by using the Stopwatch class in the System.Diagnostic namespace:
var sw1 = new System.Diagnostics.Stopwatch();
var sw2 = new System.Diagnostics.Stopwatch();
//Version 1
sw1.Start();
for (int num = 0; num < 100; num += 2)
{
Console.WriteLine(num);
}
sw1.Stop();
//Version 2
sw2.Start();
for (int num = 0; num < 100; num++)
{
if (num % 2 == 0)
{
Console.WriteLine(num);
}
}
sw2.Stop();
Console.Clear();
Console.WriteLine("Ticks for first method: " + sw1.ElapsedTicks);
Console.WriteLine("Ticks for second method: " + sw2.ElapsedTicks);
The output will show that the first method is faster.
Why is this the case?
In the first version, ignoring the console output, there is only one operation performed (+= 2) and in the end, the program goes through 50 cycles of the loop.
In the second version, it needs to do two operations (++ and % 2) and one more comparison (num % 2 == 0) in addition to 100 cycles in the loop.
If you measure the program exactly like this, then it will have a large difference in time, like multiple milliseconds. This is because the Console.WriteLine is actually taking up a lot of time. Since it's done 50 times more in the second version, it takes much more time. If you want to measure the algorithm alone, omit the console output.
If you do that, the difference in ticks on my machine is 24 ticks to 43 ticks, on average.
So in conclusion, the first method is more efficient, by about 19/10000 milliseconds.
Version 1 is "faster" from a pure analysis viewpoint (independent of compilation optimizations). But as dbarnes mentioned in the comments to your question, what you code in C# and what comes out the other side of the compiler may be completely different. Version 2 is "slower" because it has more loop iterations to cycle through, and it has slightly larger code complexity with that if-statement.
Don't concern yourself too much with optimization at first (it is called "premature optimization" to be concerned with performance before you know it exists and can prove that it is a problem worth solving). Simply code to learn, then refactor, then look into performance issues if any exist.
In fact, I'd speculate that the compiler could be configured to "unroll" both of those loops: it gets rid of the looping and simply duplicates the loop body 50 or 100 times for version 1 or 2 respectively.
Actually the adding mechanism ( Version 1 ) is a little better
The result of the addition was better than the modulus(the %) by 0.0070000 milliseconds over the course of 2 million or 200,000 iterations
But seriously… unless you have a mission critical micro-optimization need, stick with the modulus operator for easy to read code rather than trying to save roughly 00.0070000 milliseconds execution time.
Here Is The Source
Version 1 is more efficient in the sense that it iterates half as many times with the same output. Version 1 will iterate 50 times where as version 2 will iterate 100 times but only print half the results. However, I think the second shows your intentions better.
Version 1 is more efficient. Because "version 2" takes one more comparison than "version 1", while "version 1" takes two comparisons.
//Version 1
for (int num = 0; num < 100; num+=2) // Two comparisons
{
Console.WriteLine (num);
}
//Version 2
for (int num = 0; num < 100; num++) // Two comparisons
{
if (num % 2 == 0) // One comparison
{
Console.WriteLine (num);
}
}
This way is faster than method 1 and method 2 because recursion is faster than looping.
var sw3 = new System.Diagnostics.Stopwatch();
Action<int, int> printEvens = null;
printEvens = (index, count) =>
{
if (index % 2 == 0)
Console.WriteLine(index.ToString());
if (index < count)
printEvens(++index, count);
};
sw3.Start();
printEvens(0, 100);
sw3.Stop();
The output on my machine is..
Method 1: 96282
Method 2: 184336
Method 3: Recursion 59188

Efficient Rolling Max and Min Window

I want to calculate a rolling maximum and minimum value efficiently. Meaning anything better than recalculating the maximum/minimum from all the values in use every time the window moves.
There was a post on here that asked the same thing and someone posted a solution involving some kind of stack approach that supposedly worked based on its rating. However I can't find it again for the life of me.
Any help would be appreciated in finding a solution or the post. Thank you all!
The algorithm you want to use is called the ascending minima (C++ implementation).
To do this in C#, you will want to get a double ended queue class, and a good one exists on NuGet under the name Nito.Deque.
I have written a quick C# implementation using Nito.Deque, but I have only briefly checked it, and did it from my head so it may be wrong!
public static class AscendingMinima
{
private struct MinimaValue
{
public int RemoveIndex { get; set; }
public double Value { get; set; }
}
public static double[] GetMin(this double[] input, int window)
{
var queue = new Deque<MinimaValue>();
var result = new double[input.Length];
for (int i = 0; i < input.Length; i++)
{
var val = input[i];
// Note: in Nito.Deque, queue[0] is the front
while (queue.Count > 0 && i >= queue[0].RemoveIndex)
queue.RemoveFromFront();
while (queue.Count > 0 && queue[queue.Count - 1].Value >= val)
queue.RemoveFromBack();
queue.AddToBack(new MinimaValue{RemoveIndex = i + window, Value = val });
result[i] = queue[0].Value;
}
return result;
}
}
Here's one way to do it more efficiently. You still have to calculate the value occasionally but, other than certain degenerate data (ever decreasing values), that's minimised in this solution.
We'll limit ourselves to the maximum to simplify things but it's simple to extend to a minimum as well.
All you need is the following:
The window itself, initially empty.
The current maximum (max), initially any value.
The count of the current maximum (maxcount), initially zero.
The idea is to use max and maxcount as a cache for holding the current maximum. Where the cache is valid, you only need to return the value in it, a very fast constant-time operation.
If the cache is invalid when you ask for the maximum, it populates the cache and then returns that value. This is slower than the method in the previous paragraph but subsequent requests for the maximum once the cache is valid again use that faster method.
Here's what you do for maintaining the window and associated data:
Get the next value N.
If the window is full, remove the earliest entry M. If maxcount is greater than 0 and M is equal to max, decrement maxcount. Once maxcount reaches 0, the cache is invalid but we don't need to worry about that until such time the user requests the maximum value (there's no point repopulating the cache until then).
Add N to the rolling window.
If the window size is now 1 (that N is the only current entry), set max to N and maxcount to 1, then go back to step 1.
If maxcount is greater than 0 and N is greater than max, set max to N and maxcount to 1, then go back to step 1.
If maxcount is greater than 0 and N is equal to max, increment maxcount.
Go back to step 1.
Now, at any point while that window management is going on, you may request the maximum value. This is a separate operation, distinct from the window management itself. This can be done using the following rules in sequence.
If the window is empty, there is no maximum: raise an exception or return some sensible sentinel value.
If maxcount is greater than 0, then the cache is valid: simply return max.
Otherwise, the cache needs to be repopulated. Go through the entire list, setting up max and maxcount as per the code snippet below.
set max to window[0], maxcount to 0
for each x in window[]:
if x > max:
set max to x, maxcount to 1
else:
if x == max:
increment maxcount
The fact that you mostly maintain a cache of the maximum value and only recalculate when needed makes this a much more efficient solution than simply recalculating blindly whenever an entry is added.
For some definite statistics, I created the following Python program. It uses a sliding window of size 25 and uses random numbers from 0 to 999 inclusive (you can play with these properties to see how they affect the outcome).
First some initialisation code. Note the stat variables, they'll be used to count cache hits and misses:
import random
window = []
max = 0
maxcount = 0
maxwin = 25
statCache = 0
statNonCache = 0
Then the function to add a number to the window, as per my description above:
def addNum(n):
global window
global max
global maxcount
if len(window) == maxwin:
m = window[0]
window = window[1:]
if maxcount > 0 and m == max:
maxcount = maxcount - 1
window.append(n)
if len(window) == 1:
max = n
maxcount = 1
return
if maxcount > 0 and n > max:
max = n
maxcount = 1
return
if maxcount > 0 and n == max:
maxcount = maxcount + 1
Next, the code which returns the maximum value from the window:
def getMax():
global max
global maxcount
global statCache
global statNonCache
if len(window) == 0:
return None
if maxcount > 0:
statCache = statCache + 1
return max
max = window[0]
maxcount = 0
for val in window:
if val > max:
max = val
maxcount = 1
else:
if val == max:
maxcount = maxcount + 1
statNonCache = statNonCache + 1
return max
And, finally, the test harness:
random.seed()
for i in range(1000000):
val = int(1000 * random.random())
addNum(val)
newmax = getMax()
print("%d cached, %d non-cached"%(statCache,statNonCache))
Note that the test harness attempts to get the maximum for every time you add a number to the window. In practice, this may not be needed. In other words, this is the worst-case scenario for the random data generated.
Running that program a few times for pseudo-statistical purposes, we get (formatted and analysed for reporting purposes):
960579 cached, 39421 non-cached
960373 cached, 39627 non-cached
960395 cached, 39605 non-cached
960348 cached, 39652 non-cached
960441 cached, 39559 non-cached
960602 cached, 39398 non-cached
960561 cached, 39439 non-cached
960463 cached, 39537 non-cached
960409 cached, 39591 non-cached
960798 cached, 39202 non-cached
======= ======
9604969 395031
So you can see that, on average for random data, only about 3.95% of the cases resulted in a calculation hit (cache miss). The vast majority used the cached values. That should be substantially better than having to recalculate the maximum on every insertion into the window.
Some things that will affect that percentage will be:
The window size. Larger sizes means that there's more likelihood of a cache hit, improving the percentage. For example, doubling the window size pretty much halved the cache misses (to 1.95%).
The range of possible values. Less choice here means that there's more likely to be cache hits in the window. For example, reducing the range from 0..999 to 0..9 gave a big improvement in reducing cache misses (0.85%).
I'm assuming that by "window" you mean a range a[start] to a[start + len], and that start moves along. Consider the minimal value, the maximal is similar, and the move to the window a[start + 1] to a[start + len + 1]. Then the minimal value of the window will change only if (a) a[start + len + 1] < min (a smaller value came in), or (b) a[start] == min (one of the smallest values just left; recompute the minimum).
Another, possibly more efficient way of doing this, is to fill a priority queue with the first window, and update with each value entering/leaving, but I don't think that is much better (priority queues aren't suited to "pick out random element from the middle" (what you need to do when advancing the window). And the code will be much more complex. Better stick to the simple solution until proven that the performance isn't acceptable, and that this code is responsible for (much of) the resource consumption.
After writing my own algo yesterday, and asking for improvements, I was referred here. Indeed this algo is more elegant.
I'm not sure it offers constant speed calc regardless of the window size, but regardless, I tested the performance vs my own caching algo (fairly simple, and probably uses the same idea as others have proposed).
the caching is 8-15 times faster (tested with rolling windows of 5,50,300,1000 I don't need more).
below are both alternatives with stopwatches and result validation.
static class Program
{
static Random r = new Random();
static int Window = 50; //(small to facilitate visual functional test). eventually could be 100 1000, but not more than 5000.
const int FullDataSize =1000;
static double[] InputArr = new double[FullDataSize]; //array prefilled with the random input data.
//====================== Caching algo variables
static double Low = 0;
static int LowLocation = 0;
static int CurrentLocation = 0;
static double[] Result1 = new double[FullDataSize]; //contains the caching mimimum result
static int i1; //incrementor, just to store the result back to the array. In real life, the result is not even stored back to array.
//====================== Ascending Minima algo variables
static double[] Result2 = new double[FullDataSize]; //contains ascending miminum result.
static double[] RollWinArray = new double[Window]; //array for the caching algo
static Deque<MinimaValue> RollWinDeque = new Deque<MinimaValue>(); //Niro.Deque nuget.
static int i2; //used by the struct of the Deque (not just for result storage)
//====================================== my initialy proposed caching algo
static void CalcCachingMin(double currentNum)
{
RollWinArray[CurrentLocation] = currentNum;
if (currentNum <= Low)
{
LowLocation = CurrentLocation;
Low = currentNum;
}
else if (CurrentLocation == LowLocation)
ReFindHighest();
CurrentLocation++;
if (CurrentLocation == Window) CurrentLocation = 0; //this is faster
//CurrentLocation = CurrentLocation % Window; //this is slower, still over 10 fold faster than ascending minima
Result1[i1++] = Low;
}
//full iteration run each time lowest is overwritten.
static void ReFindHighest()
{
Low = RollWinArray[0];
LowLocation = 0; //bug fix. missing from initial version.
for (int i = 1; i < Window; i++)
if (RollWinArray[i] < Low)
{
Low = RollWinArray[i];
LowLocation = i;
}
}
//======================================= Ascending Minima algo based on http://stackoverflow.com/a/14823809/2381899
private struct MinimaValue
{
public int RemoveIndex { get; set; }
public double Value { get; set; }
}
public static void CalcAscendingMinima (double newNum)
{ //same algo as the extension method below, but used on external arrays, and fed with 1 data point at a time like in the projected real time app.
while (RollWinDeque.Count > 0 && i2 >= RollWinDeque[0].RemoveIndex)
RollWinDeque.RemoveFromFront();
while (RollWinDeque.Count > 0 && RollWinDeque[RollWinDeque.Count - 1].Value >= newNum)
RollWinDeque.RemoveFromBack();
RollWinDeque.AddToBack(new MinimaValue { RemoveIndex = i2 + Window, Value = newNum });
Result2[i2++] = RollWinDeque[0].Value;
}
public static double[] GetMin(this double[] input, int window)
{ //this is the initial method extesion for ascending mimima
//taken from http://stackoverflow.com/a/14823809/2381899
var queue = new Deque<MinimaValue>();
var result = new double[input.Length];
for (int i = 0; i < input.Length; i++)
{
var val = input[i];
// Note: in Nito.Deque, queue[0] is the front
while (queue.Count > 0 && i >= queue[0].RemoveIndex)
queue.RemoveFromFront();
while (queue.Count > 0 && queue[queue.Count - 1].Value >= val)
queue.RemoveFromBack();
queue.AddToBack(new MinimaValue { RemoveIndex = i + window, Value = val });
result[i] = queue[0].Value;
}
return result;
}
//============================================ Test program.
static void Main(string[] args)
{ //this it the test program.
//it runs several attempts of both algos on the same data.
for (int j = 0; j < 10; j++)
{
Low = 12000;
for (int i = 0; i < Window; i++)
RollWinArray[i] = 10000000;
//Fill the data + functional test - generate 100 numbers and check them in as you go:
InputArr[0] = 12000;
for (int i = 1; i < FullDataSize; i++) //fill the Input array with random data.
//InputArr[i] = r.Next(100) + 11000;//simple data.
InputArr[i] = InputArr[i - 1] + r.NextDouble() - 0.5; //brownian motion data.
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < FullDataSize; i++) //run the Caching algo.
CalcCachingMin(InputArr[i]);
stopwatch.Stop();
Console.WriteLine("Caching : " + stopwatch.ElapsedTicks + " mS: " + stopwatch.ElapsedMilliseconds);
stopwatch.Reset();
stopwatch.Start();
for (int i = 0; i < FullDataSize; i++) //run the Ascending Minima algo
CalcAscendingMinima(InputArr[i]);
stopwatch.Stop();
Console.WriteLine("AscMimima: " + stopwatch.ElapsedTicks + " mS: " + stopwatch.ElapsedMilliseconds);
stopwatch.Reset();
i1 = 0; i2 = 0; RollWinDeque.Clear();
}
for (int i = 0; i < FullDataSize; i++) //test the results.
if (Result2[i] != Result1[i]) //this is a test that algos are valid. Errors (mismatches) are printed.
Console.WriteLine("Current:" + InputArr[i].ToString("#.00") + "\tLowest of " + Window + "last is " + Result1[i].ToString("#.00") + " " + Result2[i].ToString("#.00") + "\t" + (Result1[i] == Result2[i])); //for validation purposes only.
Console.ReadLine();
}
}
I would suggest maintaining a stack which supports getMin() or getMax().
This can be done with two stacks and costs only constant time.
fyi: https://www.geeksforgeeks.org/design-a-stack-that-supports-getmin-in-o1-time-and-o1-extra-space/

How come this algorithm in Ruby runs faster than in Parallel'd C#?

The following ruby code runs in ~15s. It barely uses any CPU/Memory (about 25% of one CPU):
def collatz(num)
num.even? ? num/2 : 3*num + 1
end
start_time = Time.now
max_chain_count = 0
max_starter_num = 0
(1..1000000).each do |i|
count = 0
current = i
current = collatz(current) and count += 1 until (current == 1)
max_chain_count = count and max_starter_num = i if (count > max_chain_count)
end
puts "Max starter num: #{max_starter_num} -> chain of #{max_chain_count} elements. Found in: #{Time.now - start_time}s"
And the following TPL C# puts all my 4 cores to 100% usage and is orders of magnitude slower than the ruby version:
static void Euler14Test()
{
Stopwatch sw = new Stopwatch();
sw.Start();
int max_chain_count = 0;
int max_starter_num = 0;
object locker = new object();
Parallel.For(1, 1000000, i =>
{
int count = 0;
int current = i;
while (current != 1)
{
current = collatz(current);
count++;
}
if (count > max_chain_count)
{
lock (locker)
{
max_chain_count = count;
max_starter_num = i;
}
}
if (i % 1000 == 0)
Console.WriteLine(i);
});
sw.Stop();
Console.WriteLine("Max starter i: {0} -> chain of {1} elements. Found in: {2}s", max_starter_num, max_chain_count, sw.Elapsed.ToString());
}
static int collatz(int num)
{
return num % 2 == 0 ? num / 2 : 3 * num + 1;
}
How come ruby runs faster than C#? I've been told that Ruby is slow. Is that not true when it comes to algorithms?
Perf AFTER correction:
Ruby (Non parallel): 14.62s
C# (Non parallel): 2.22s
C# (With TPL): 0.64s
Actually, the bug is quite subtle, and has nothing to do with threading. The reason that your C# version takes so long is that the intermediate values computed by the collatz method eventually start to overflow the int type, resulting in negative numbers which may then take ages to converge.
This first happens when i is 134,379, for which the 129th term (assuming one-based counting) is 2,482,111,348. This exceeds the maximum value of 2,147,483,647 and therefore gets stored as -1,812,855,948.
To get good performance (and correct results) on the C# version, just change:
int current = i;
…to:
long current = i;
…and:
static int collatz(int num)
…to:
static long collatz(long num)
That will bring down your performance to a respectable 1.5 seconds.
Edit: CodesInChaos raises a very valid point about enabling overflow checking when debugging math-oriented applications. Doing so would have allowed the bug to be immediately identified, since the runtime would throw an OverflowException.
Should be:
Parallel.For(1L, 1000000L, i =>
{
Otherwise, you have integer overfill and start checking negative values. The same collatz method should operate with long values.
I experienced something like that. And I figured out that's because each of your loop iterations need to start other thread and this takes some time, and in this case it's comparable (I think it's more time) than the operations you acctualy do in the loop body.
There is an alternative for that: You can get how many CPU cores you have and than use a parallelism loop with the same number of iterations you have cores, each loop will evaluate part of the acctual loop you want, it's done by making an inner for loop that depends on the parallel loop.
EDIT: EXAMPLE
int start = 1, end = 1000000;
Parallel.For(0, N_CORES, n =>
{
int s = start + (end - start) * n / N_CORES;
int e = n == N_CORES - 1 ? end : start + (end - start) * (n + 1) / N_CORES;
for (int i = s; i < e; i++)
{
// Your code
}
});
You should try this code, I'm pretty sure this will do the job faster.
EDIT: ELUCIDATION
Well, quite a long time since I answered this question, but I faced the problem again and finally understood what's going on.
I've been using AForge implementation of Parallel for loop, and it seems like, it fires a thread for each iteration of the loop, so, that's why if the loop takes relatively a small amount of time to execute, you end up with a inefficient parallelism.
So, as some of you pointed out, System.Threading.Tasks.Parallel methods are based on Tasks, which are kind of a higher level of abstraction of a Thread:
"Behind the scenes, tasks are queued to the ThreadPool, which has been enhanced with algorithms that determine and adjust to the number of threads and that provide load balancing to maximize throughput. This makes tasks relatively lightweight, and you can create many of them to enable fine-grained parallelism."
So yeah, if you use the default library's implementation, you won't need to use this kind of "bogus".

Is Multiplication Faster Than Comparison in .NET?

Good morning, afternoon or night,
Up until today, I thought comparison was one of the basic processor instructions, and so that it was one of the fastest operations one can do in a computer... On the other hand I know multiplication in sometimes trickier and involves lots of bits operations. However, I was a little shocked to look at the results of the following code:
Stopwatch Test = new Stopwatch();
int a = 0;
int i = 0, j = 0, l = 0;
double c = 0, d = 0;
for (i = 0; i < 32; i++)
{
Test.Start();
for (j = Int32.MaxValue, l = 1; j != 0; j = -j + ((j < 0) ? -1 : 1), l = -l)
{
a = l * j;
}
Test.Stop();
Console.WriteLine("Product: {0}", Test.Elapsed.TotalMilliseconds);
c += Test.Elapsed.TotalMilliseconds;
Test.Reset();
Test.Start();
for (j = Int32.MaxValue, l = 1; j != 0; j = -j + ((j < 0) ? -1 : 1), l = -l)
{
a = (j < 0) ? -j : j;
}
Test.Stop();
Console.WriteLine("Comparison: {0}", Test.Elapsed.TotalMilliseconds);
d += Test.Elapsed.TotalMilliseconds;
Test.Reset();
}
Console.WriteLine("Product: {0}", c / 32);
Console.WriteLine("Comparison: {0}", d / 32);
Console.ReadKey();
}
Result:
Product: 8558.6
Comparison: 9799.7
Quick explanation: j is an ancillary alternate variable which goes like (...), 11, -10, 9, -8, 7, (...) until it reaches zero, l is a variable which stores j's sign, and a is the test variable, which I want always to be equal to the modulus of j. The goal of the test was to check whether it is faster to set a to this value using multiplication or the conditional operator.
Can anyone please comment on these results?
Thank you very much.
Your second test it's not a mere comparison, but an if statement.
That's probably translated in a JUMP/BRANCH instruction in CPU, involving branch prediction (with possible blocks of the pipeline) and then is likely slower than a simple multiplication (even if not so much).
It can often be very difficult to make such assertions about an optimising compiler. They do lots of tricks that make simple cases different from real code. That said, you aren't just doing a comparison, you're doing a compare/assign in a very tight loop. The thread you're working on may have to pause many times at the branch; the multiplication can assign as many times as it likes, as long as the last assignment is still last, so many multiplies can be going on at once.
As is the general rule, make your code clear and ignore minor timing issues unless they become a problem.
If you do have a speed problem a good tracing/timing tool will guide you much better than knowing if one operation is faster than other in a specific case.
I guess one comment I would make is that you are doing a lot more in the second operation:
a = (j < 0) ? -j : j;
Not only are you doing a comparison, but also effectivly a "if..else.." with the ? operator and a negation of j.
You should try to run this test a 1000 or so times and use avrage to compare you never now what CLR is doing in background

Categories