C# boolean compare vs set performance - c#

In C#, suppose I have a foreach loop and it is possible that the iterator will be empty. Following the loop, more actions need to be taken only if the iterator was not empty. So I declare bool res = false; before the loop. Is it faster to just set res = true; in each loop iteration, or to test if it's been done yet, as in if (!res) res = true;. I suppose the question could more succinctly be stated as "is it faster to set a bool's value or test its value?"
In addition, even if one is slightly faster than the other, is it feasible to have so many iterations in the loop that the impact on performance is not negligible?

To kill a few minutes:
static void Main(string[] args)
{
bool test = false;
Stopwatch sw = new Stopwatch();
sw.Start();
for (long i = 0; i < 100000000; i++)
{
if (!test)
test = true;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds + ". Hi, I'm just using test somehow:" + test);
sw.Reset();
bool test2 = false;
sw.Start();
for (long i = 0; i < 100000000; i++)
{
test2 = true;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds + ". Hi, I'm just using test2 somehow:" + test2);
Console.ReadKey();
}
Output:
448
379
So, unless missed somthing, just setting the value is faster than checking and then setting it. Is that what you wanted to test?
EDIT:
Fixed an error pointed out in the comments. As a side note, I indeed ran this test a few times and even when the miliseconds changed, the second case was always slighty faster.

if (!res) res = true is redundant.
The compiler should be smart enough to know that res will always end up being true and remove your if statement and/or completely remove the set altogether if you compile with Release / Optimize Code.
To your question itself. It should be faster to set a primitive value than to actually compare and set. I highly doubt you would be able to accurately measure the time difference at all on a primitive and just thinking about this alone consumed more time than the process will in x exagerrated iterations.

Related

C# Extension method slower than chained Replace unless in tight loop. Why?

I have an extension method to remove certain characters from a string (a phone number) which is performing much slower than I think it should vs chained Replace calls. The weird bit, is that in a loop it overtakes the Replace thing if the loop runs for around 3000 iterations, and after that it's faster. Lower than that and chaining Replace is faster. It's like there's a fixed overhead to my code which Replace doesn't have. What could this be!?
Quick look. When only testing 10 numbers, mine takes about 0.3ms, while Replace takes only 0.01ms. A massive difference! But when running 5 million, mine takes around 1700ms while Replace takes about 2500ms.
Phone numbers will only have 0-9, +, -, (, )
Here's the relevant code:
Building test cases, I'm playing with testNums.
int testNums = 5_000_000;
Console.WriteLine("Building " + testNums + " tests");
Random rand = new Random();
string[] tests = new string[testNums];
char[] letters =
{
'0','1','2','3','4','5','6','7','8','9',
'+','-','(',')'
};
for(int t = 0; t < tests.Length; t++)
{
int length = rand.Next(5, 20);
char[] word = new char[length];
for(int c = 0; c < word.Length; c++)
{
word[c] = letters[rand.Next(letters.Length)];
}
tests[t] = new string(word);
}
Console.WriteLine("Tests built");
string[] stripped = new string[tests.Length];
Using my extension method:
Stopwatch stopwatch = Stopwatch.StartNew();
for (int i = 0; i < stripped.Length; i++)
{
stripped[i] = tests[i].CleanNumberString();
}
stopwatch.Stop();
Console.WriteLine("Clean: " + stopwatch.Elapsed.TotalMilliseconds + "ms");
Using chained Replace:
stripped = new string[tests.Length];
stopwatch = Stopwatch.StartNew();
for (int i = 0; i < stripped.Length; i++)
{
stripped[i] = tests[i].Replace(" ", string.Empty)
.Replace("-", string.Empty)
.Replace("(", string.Empty)
.Replace(")", string.Empty)
.Replace("+", string.Empty);
}
stopwatch.Stop();
Console.WriteLine("Replace: " + stopwatch.Elapsed.TotalMilliseconds + "ms");
Extension method in question:
public static string CleanNumberString(this string s)
{
Span<char> letters = stackalloc char[s.Length];
int count = 0;
for (int i = 0; i < s.Length; i++)
{
if (s[i] >= '0' && s[i] <= '9')
letters[count++] = s[i];
}
return new string(letters.Slice(0, count));
}
What I've tried:
I've run them around the other way. Makes a tiny difference, but not enough.
Make it a normal static method, which was significantly slower than extension. As a ref parameter was slightly slower, and in parameter was about the same as extension method.
Aggressive Inlining. Doesn't make any real difference. I'm in release mode, so I suspect the compiler inlines it anyway. Either way, not much change.
I have also looked at memory allocations, and that's as I expect. My one allocates on the managed heap only one string per iteration (the new string at the end) which Replace allocates a new object for each Replace. So the memory used by the Replace one is much, higher. But it's still faster!
Is it calling native C code and doing something crafty there? Is the higher memory usage triggering the GC and slowing it down (still doesn't explane the insanely fast time on only one or two iterations)
Any ideas?
(Yes, I know not to bother optimising things like this, it's just bugging me because I don't know why it's doing this)
After doing some benchmarks, I think can safely assert that your initial statement is wrong for the exact reason you mentionned in your deleted answer: the loading time of the method is the only thing that misguided you.
Here's the full benchmark on a simplified version of the problem:
static void Main(string[] args)
{
// Build string of n consecutive "ab"
int n = 1000;
Console.WriteLine("N: " + n);
char[] c = new char[n];
for (int i = 0; i < n; i+=2)
c[i] = 'a';
for (int i = 1; i < n; i += 2)
c[i] = 'b';
string s = new string(c);
Stopwatch stopwatch;
// Make sure everything is loaded
s.CleanNumberString();
s.Replace("a", "");
s.UnsafeRemove();
// Tests to remove all 'a' from the string
// Unsafe remove
stopwatch = Stopwatch.StartNew();
string a1 = s.UnsafeRemove();
stopwatch.Stop();
Console.WriteLine("Unsafe remove:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");
// Extension method
stopwatch = Stopwatch.StartNew();
string a2 = s.CleanNumberString();
stopwatch.Stop();
Console.WriteLine("Clean method:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");
// String replace
stopwatch = Stopwatch.StartNew();
string a3 = s.Replace("a", "");
stopwatch.Stop();
Console.WriteLine("String.Replace:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");
// Make sure the returned strings are identical
Console.WriteLine(a1.Equals(a2) && a2.Equals(a3));
Console.ReadKey();
}
public static string CleanNumberString(this string s)
{
char[] letters = new char[s.Length];
int count = 0;
for (int i = 0; i < s.Length; i++)
if (s[i] == 'b')
letters[count++] = 'b';
return new string(letters.SubArray(0, count));
}
public static T[] SubArray<T>(this T[] data, int index, int length)
{
T[] result = new T[length];
Array.Copy(data, index, result, 0, length);
return result;
}
// Taken from https://stackoverflow.com/a/2183442/6923568
public static unsafe string UnsafeRemove(this string s)
{
int len = s.Length;
char* newChars = stackalloc char[len];
char* currentChar = newChars;
for (int i = 0; i < len; ++i)
{
char c = s[i];
switch (c)
{
case 'a':
continue;
default:
*currentChar++ = c;
break;
}
}
return new string(newChars, 0, (int)(currentChar - newChars));
}
When ran with different values of n, it is clear that your extension method (or at least my somewhat equivalent version of it) has a logic that makes it faster than String.Replace(). In fact, it is more performant on either small or big strings:
N: 100
Unsafe remove: 0,0024ms
Clean method: 0,0015ms
String.Replace: 0,0021ms
True
N: 100000
Unsafe remove: 0,3889ms
Clean method: 0,5308ms
String.Replace: 1,3993ms
True
I highly suspect optimizations for the replacement of strings (not to be compared to removal) in String.Replace() to be the culprit here. I also added a method from this answer to have another comparison on removal of characters. That one's times behave similarly to your method but gets faster on higher values (80k+ on my tests) of n.
With all that being said, since your question is based on an assumption that we found was false, if you need more explanation on why the opposite is true (i.e. "Why is String.Replace() slower than my method"), plenty of in-depth benchmarks about string manipulation already do so.
I ran the clean method a couple more. interestingly, it is a lot faster than the Replace. Only the first time run was slower. Sorry that I couldn't explain why it's slower the first time but I ran more of the method then the result was expected.
Building 100 tests
Tests built
Replace: 0.0528ms
Clean: 0.4526ms
Clean: 0.0413ms
Clean: 0.0294ms
Replace: 0.0679ms
Replace: 0.0523ms
used dotnet core 2.1
So I've found with help from daehee Kim and Mat below that it's only the first iteration, but it's for the whole first loop. Every loop after there is ok.
I use the following line to force the JIT to do its thing and initialise this method:
RuntimeHelpers.PrepareMethod(typeof(CleanExtension).GetMethod("CleanNumberString", BindingFlags.Public | BindingFlags.Static).MethodHandle);
I find the JIT usually takes about 2-3ms to do its thing here (including Reflection time of about 0.1ms). Note that you should probably not be doing this because you're now getting the Reflection cost as well, and the JIT will be called right after this anyway, but it's probably a good idea for benchmarks to fairly compare.
The more you know!
My benchmark for a loop of 5000 iterations, repeated 5000 times with random strings and averaged is:
Clean: 0.41078ms
Replace: 1.4974ms

difference between" if(condition) int++" and int += "Convert.Toint32(condition)"

these two ways of incrementing a value by one
if (Condition) int++;
and
int+= Convert.Toint32(Condition);
so is there and benefit to write in one way or another or are they basically the same?
Adding a Boolean to an integer doesn't make any sense.
Yes, it works, because of the conversion. But it still doesn't make any sense. It's illogical.
Programs should be obvious and clear, not puzzles to be solved.
I get 7527ms and 5888ms on my machine from the benchmark below. The first approach (boolean conversion), besides being just awful from a code readability point of view is also slower. That makes sense, that approach has the overhead of ALWAYS 1) performing a conversion from bool to int, and 2) performing an addition operation. Yes, there are probably shortcuts for adding "0", but that's still ANOTHER test that has to be looked at.
int sum = 0;
var sw = Stopwatch.StartNew();
for (int i = 0; i < Int32.MaxValue; i++) {
bool condition = i < Int32.MaxValue / 2;
sum += Convert.ToInt32(condition);
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
sum = 0;
sw = Stopwatch.StartNew();
for (int i = 0; i < Int32.MaxValue; i++) {
bool condition = i < Int32.MaxValue / 2;
if (condition) {
sum++;
}
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
There are many many many many ways to write code that does the same thing. But it all comes down to readability and maintainability.
You can choose to write it in binary and you will be able to optimize it in the most efficient way. But you won't find too many ppl able to maintain the code you have written. I bet not even you want to read your own code in binary when there is a bug.
so which way do you want to do it? considering that there is not that much performance gain for the 2nd method, I would say definitely go for the 1st for the sake of ppl who might be reading your code later.
I'm thinking the clarity of the code depends on the context.
For almost all ordinary cases,
if (condition) i++;
...is going to be easier to read.
But there may be some situations like this one where the alternative makes it easier to follow. Imagine if this list were very long:
var errorCount = 0
errorCount += Convert.ToInt32(o.HasAProblem);
errorCount += Convert.ToInt32(o.HasSomeOtherProblem);
errorCount += Convert.ToInt32(p.DoesntWork);
On the other hand, for the above, maybe I'd find a different way of structuring the code entirely, e.g.
var errorFlags = new [] {o.HasProblem,
o.HasSomeOtherProblem,
p.DoesntWork};
var errorCount = errorFlags.Count(a => a);
Also, the construct
i += Convert.ToInt32(condition);
...may result in a cleaner pipeline since there is no branch prediction involved. The key word is may.

Timing C# code using Timer

Even though it is good to check performance of code in terms of algorithmic analysis and Big-Oh! notation i wanted to see how much it takes for the code to execute in my PC. I had initialized a List to 9999count and removed even elements out from the them. Sadly the timespan to execute this seems to be 0:0:0. Surprised by the result there must be something wrong in the way i time the execution. Could someone help me time the code correct?
IList<int> source = new List<int>(100);
for (int i = 0; i < 9999; i++)
{
source.Add(i);
}
TimeSpan startTime, duration;
startTime = Process.GetCurrentProcess().Threads[0].UserProcessorTime;
RemoveEven(ref source);
duration = Process.GetCurrentProcess().Threads[0].UserProcessorTime.Subtract(startTime);
Console.WriteLine(duration.Milliseconds);
Console.Read();
The most appropriate thing to use there would be Stopwatch - anything involving TimeSpan has nowhere near enough precision for this:
var watch = Stopwatch.StartNew();
// something to time
watch.Stop();
Console.WriteLine(watch.ElapsedMilliseconds);
However, a modern CPU is very fast, and it would not surprise me if it can remove them in that time. Normally, for timing, you need to repeat an operation a large number of times to get a reasonable measurement.
Aside: the ref in RemoveEven(ref source) is almost certainly not needed.
In .Net 2.0 you can use the Stopwatch class
IList<int> source = new List<int>(100);
for (int i = 0; i < 9999; i++)
{
source.Add(i);
}
Stopwatch watch = new Stopwatch();
watch.Start();
RemoveEven(ref source);
//watch.ElapsedMilliseconds contains the execution time in ms
watch.Stop()
Adding to previous answers:
var sw = Stopwatch.StartNew();
// instructions to time
sw.Stop();
sw.ElapsedMilliseconds returns a long and has a resolution of:
1 millisecond = 1000000 nanoseconds
sw.Elapsed.TotalMilliseconds returns a double and has a resolution equal to the inverse of Stopwatch.Frequency. On my PC for example Stopwatch.Frequency has a value of 2939541 ticks per second, that gives sw.Elapsed.TotalMilliseconds a resolution of:
1/2939541 seconds = 3,401891655874165e-7 seconds = 340 nanoseconds

Insertion Speed in LinkedList

While performance testing, I noticed something interesting.
I noticed that the very first insertion into a LinkedList(C# Generics) is extremely slower than any other insertion done at the head of the list. I simply used the C# template LinkedList and used AddFirst() for each insertion into the LinkedList. Why is the very first insertion the slowest?
First Five Insertion Results:
First insertion into list: 0.0152 milliseconds
Second insertion into list(at head): 0.0006 milliseconds
Third insertion into list(at head): 0.0003 milliseconds
Fourth insertion into list(at head): 0.0006 milliseconds
Fifth insertion into list(at head): 0.0006 milliseconds
Performance Testing Code:
using (StreamReader readText = new StreamReader("MillionNumbers.txt"))
{
String line;
Int32 counter = 0;
while ((line = readText.ReadLine()) != null)
{
watchTime.Start();
theList.AddFirst(line);
watchTime.Stop();
Double time = watchTime.Elapsed.TotalMilliseconds;
totalTime = totalTime + time;
Console.WriteLine(time);
watchTime.Reset();
++counter;
}
Console.WriteLine(totalTime);
Console.WriteLine(counter);
Console.WriteLine(totalTime / counter);
}
Timing a single operation is very dangerous - the slightest stutter can make a huge difference in results. Additionally, it's not clear that you've done anything with LinkedList<T> before this code, which means you'd be timing the JITting of AddFirst and possibly even whole other types involved.
Timing just the first insert is rather difficult as once you've done it, you can't easily repeat it. However, you can time "insert and remove" repeatedly, as this code does:
using System;
using System.Collections.Generic;
using System.Diagnostics;
class Program
{
public static void Main(string[] args)
{
// Make sure we've JITted the LinkedList code
new LinkedList<string>().AddFirst("ignored");
LinkedList<string> list = new LinkedList<string>();
TimeInsert(list);
list.AddFirst("x");
TimeInsert(list);
list.AddFirst("x");
TimeInsert(list);
list.AddFirst("x");
}
const int Iterations = 100000000;
static void TimeInsert(LinkedList<string> list)
{
GC.Collect();
GC.WaitForPendingFinalizers();
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
list.AddFirst("item");
list.RemoveFirst();
}
sw.Stop();
Console.WriteLine("Initial size: {0}; Ticks: {1}",
list.Count, sw.ElapsedTicks);
}
}
My results:
Initial size: 0; Ticks: 5589583
Initial size: 1; Ticks: 8137963
Initial size: 2; Ticks: 8399579
This is what I'd expect, as depending on the internal representation there's very slightly more work to do in terms of hooking up the "previous head" when adding and removing to an already-populated list.
My guess is you're seeing JIT time, but really your code doesn't really time accurately enough to be useful, IMO.

C# performance analysis- how to count CPU cycles?

Is this a valid way to do performance analysis? I want to get nanosecond accuracy and determine the performance of typecasting:
class PerformanceTest
{
static double last = 0.0;
static List<object> numericGenericData = new List<object>();
static List<double> numericTypedData = new List<double>();
static void Main(string[] args)
{
double totalWithCasting = 0.0;
double totalWithoutCasting = 0.0;
for (double d = 0.0; d < 1000000.0; ++d)
{
numericGenericData.Add(d);
numericTypedData.Add(d);
}
Stopwatch stopwatch = new Stopwatch();
for (int i = 0; i < 10; ++i)
{
stopwatch.Start();
testWithTypecasting();
stopwatch.Stop();
totalWithCasting += stopwatch.ElapsedTicks;
stopwatch.Start();
testWithoutTypeCasting();
stopwatch.Stop();
totalWithoutCasting += stopwatch.ElapsedTicks;
}
Console.WriteLine("Avg with typecasting = {0}", (totalWithCasting/10));
Console.WriteLine("Avg without typecasting = {0}", (totalWithoutCasting/10));
Console.ReadKey();
}
static void testWithTypecasting()
{
foreach (object o in numericGenericData)
{
last = ((double)o*(double)o)/200;
}
}
static void testWithoutTypeCasting()
{
foreach (double d in numericTypedData)
{
last = (d * d)/200;
}
}
}
The output is:
Avg with typecasting = 468872.3
Avg without typecasting = 501157.9
I'm a little suspicious... it looks like there is nearly no impact on the performance. Is casting really that cheap?
Update:
class PerformanceTest
{
static double last = 0.0;
static object[] numericGenericData = new object[100000];
static double[] numericTypedData = new double[100000];
static Stopwatch stopwatch = new Stopwatch();
static double totalWithCasting = 0.0;
static double totalWithoutCasting = 0.0;
static void Main(string[] args)
{
for (int i = 0; i < 100000; ++i)
{
numericGenericData[i] = (double)i;
numericTypedData[i] = (double)i;
}
for (int i = 0; i < 10; ++i)
{
stopwatch.Start();
testWithTypecasting();
stopwatch.Stop();
totalWithCasting += stopwatch.ElapsedTicks;
stopwatch.Reset();
stopwatch.Start();
testWithoutTypeCasting();
stopwatch.Stop();
totalWithoutCasting += stopwatch.ElapsedTicks;
stopwatch.Reset();
}
Console.WriteLine("Avg with typecasting = {0}", (totalWithCasting/(10.0)));
Console.WriteLine("Avg without typecasting = {0}", (totalWithoutCasting / (10.0)));
Console.ReadKey();
}
static void testWithTypecasting()
{
foreach (object o in numericGenericData)
{
last = ((double)o * (double)o) / 200;
}
}
static void testWithoutTypeCasting()
{
foreach (double d in numericTypedData)
{
last = (d * d) / 200;
}
}
}
The output is:
Avg with typecasting = 4791
Avg without typecasting = 3303.9
Note that it's not typecasting that you are measuring, it's unboxing. The values are doubles all along, there is no type casting going on.
You forgot to reset the stopwatch between tests, so you are adding the accumulated time of all previous tests over and over. If you convert the ticks to actual time, you see that it adds up to much more than the time it took to run the test.
If you add a stopwatch.Reset(); before each stopwatch.Start();, you get a much more reasonable result like:
Avg with typecasting = 41027,1
Avg without typecasting = 20594,3
Unboxing a value is not so expensive, it only has to check that the data type in the object is correct, then get the value. Still it's a lot more work than when the type is already known. Remember that you are also measuring the looping, calculation and assigning of the result, which is the same for both tests.
Boxing a value is more expensive than unboxing it, as that allocates an object on the heap.
1) Yes, casting is usually (very) cheap.
2) You are not going to get nanosecond accuracy in a managed language. Or in an unmanaged language under most operating systems.
Consider
other processes
garbage collection
different JITters
different CPUs
And, your measurement includes the foreach loop, looks like 50% or more to me. Maybe 90%.
When you call Stopwatch.Start it is letting the timer continue to run from wherever it left off. You need to call Stopwatch.Reset() to set the timers back to zero before starting again. Personally I just use stopwatch = Stopwatch.StartNew() whenever I want to start a timer to avoid this sort of confusion.
Furthermore, you probably want to call both of your test methods before starting the "timing loop" so that they get a fair chance to "warm up" that piece of code and ensure that the JIT has had a chance to run to even the playing field.
When I do that on my machine, I see that testWithTypecasting runs in approximately half the time as testWithoutTypeCasting.
That being said however, the cast itself it not likely to be the most significant part of that performance penalty. The testWithTypecasting method is operating on a list of boxed doubles which means that there is an additional level of indirection required to retrieve each value (follow a reference to the value somewhere else in memory) in addition to increasing the total amount of memory consumed. This increases the amount of time spent on memory access and is likely to be a bigger effect than the CPU time spent "in the cast" itself.
Look into performance counters in the System.Diagnostics namespace, When you create a new counter, you first create a category, and then specify one or more counters to be placed in it.
// Create a collection of type CounterCreationDataCollection.
System.Diagnostics.CounterCreationDataCollection CounterDatas =
new System.Diagnostics.CounterCreationDataCollection();
// Create the counters and set their properties.
System.Diagnostics.CounterCreationData cdCounter1 =
new System.Diagnostics.CounterCreationData();
System.Diagnostics.CounterCreationData cdCounter2 =
new System.Diagnostics.CounterCreationData();
cdCounter1.CounterName = "Counter1";
cdCounter1.CounterHelp = "help string1";
cdCounter1.CounterType = System.Diagnostics.PerformanceCounterType.NumberOfItems64;
cdCounter2.CounterName = "Counter2";
cdCounter2.CounterHelp = "help string 2";
cdCounter2.CounterType = System.Diagnostics.PerformanceCounterType.NumberOfItems64;
// Add both counters to the collection.
CounterDatas.Add(cdCounter1);
CounterDatas.Add(cdCounter2);
// Create the category and pass the collection to it.
System.Diagnostics.PerformanceCounterCategory.Create(
"Multi Counter Category", "Category help", CounterDatas);
see MSDN docs
Just a thought but sometimes identical machine code can take a different number of cycles to execute depending on its alignment in memory so you might want to add a control or controls.
Don't "do" C# myself but in C for x86-32 and later the rdtsc instruction is usually available which is much more accurate than OS ticks. More info on rdtsc can be found by searching stackoverflow. Under C it is usually available as an intrinsic or built-in function and returns the number of clock cycles (in an 8 byte - long long/__int64 - unsigned integer) since the computer was powered up. So if the CPU has a clock speed of 3 Ghz the underlying counter is incremented 3 billion times per second. Save for a few early AMD processors, all multi-core CPUs will have their counters synchronized.
If C# does not have it you might consider writing a VERY short C function to access it from C#. There is a great deal of overhead if you access the instruction through a function vs inline. The difference between two back-to-back calls to the function will be the basic measurement overhead. If you're thinking of metering your application you'll have to determine several more complex overhead values.
You might consider shutting off the CPU energy-saving mode (and restarting the PC) as it lowers the clock frequency being fed to the CPU during periods of low activity. This is since it causes the time stamp counters of the different cores to become un-synchronized.

Categories