StopWatch gives random results - c#

That is my first attempt to use StopWatch to meassure code performance and I don't know what is wrong. I want to check if there is difference when casting to double to calculate average with integers.
public static double Avarage(int a, int b)
{
return (a + b + 0.0) / 2;
}
public static double AvarageDouble(int s, int d)
{
return (double)(s + d) / 2;
}
public static double AvarageDouble2(int x, int v)
{
return ((double)x + v) / 2;
}
Code to test these 3 methods, using StopWatch:
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 1000000; i++)
{
var ret = Avarage(2, 3);
}
sw.Stop();
Console.Write("Using 0.0: " + sw.ElapsedTicks + "\n");
sw.Reset();
sw.Start();
for (int i = 0; i < 1000000; i++)
{
var ret2 = AvarageDouble(2, 3);
}
sw.Stop();
Console.Write("Using Double(s+d): " + sw.ElapsedTicks + "\n");
sw.Reset();
sw.Start();
for (int i = 0; i < 1000000; i++)
{
var ret3 = AvarageDouble2(2, 3);
}
sw.Stop();
Console.Write("Using double (x): " + sw.ElapsedTicks + "\n");
It shows random result, once Average is the fastets, other time AverageDouble or AverageDouble2. I use diff variable names, but looks like it does not matter.
What am I missing?
PS. What is the best method to calculate average with two ints as inputs?

Tested your code, yes the results was very random at times. Remember Stopwatch is only the time elapsed from sw.start() to sw.stop(). It does not take into consideration .Net's Just In Time compilation, operating system process scheduling, cpu load etc.
This will be more noteworthy in methods with such small runtimes. Where these noises, can more then double the runtime.
An elaborate and better explanation is written in the following SO question.

Related

Ryzen vs. i7 Multi-Threaded Performance

I made the following C# Console App:
class Program
{
static RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
public static ConcurrentDictionary<int, int> StateCount { get; set; }
static int length = 1000000000;
static void Main(string[] args)
{
StateCount = new ConcurrentDictionary<int, int>();
for (int i = 0; i < 3; i++)
{
StateCount.AddOrUpdate(i, 0, (k, v) => 0);
}
Console.WriteLine("Processors: " + Environment.ProcessorCount);
Console.WriteLine("Starting...");
Console.WriteLine();
Timer t = new Timer(1000);
t.Elapsed += T_Elapsed;
t.Start();
Stopwatch sw = new Stopwatch();
sw.Start();
Parallel.For(0, length, (i) =>
{
var rand = GetRandomNumber();
int newState = 0;
if(rand < 0.3)
{
newState = 0;
}
else if (rand < 0.6)
{
newState = 1;
}
else
{
newState = 2;
}
StateCount.AddOrUpdate(newState, 0, (k, v) => v + 1);
});
sw.Stop();
t.Stop();
Console.WriteLine();
Console.WriteLine("Total time: " + sw.Elapsed.TotalSeconds);
Console.ReadKey();
}
private static void T_Elapsed(object sender, ElapsedEventArgs e)
{
int total = 0;
for (int i = 0; i < 3; i++)
{
if(StateCount.TryGetValue(i, out int value))
{
total += value;
}
}
int percent = (int)Math.Round((total / (double)length) * 100);
Console.Write("\r" + percent + "%");
}
public static double GetRandomNumber()
{
var bytes = new Byte[8];
rng.GetBytes(bytes);
var ul = BitConverter.ToUInt64(bytes, 0) / (1 << 11);
Double randomDouble = ul / (Double)(1UL << 53);
return randomDouble;
}
}
Before running this, the Task Manager reported <2% CPU usage (across all runs and machines).
I ran it on a machine with a Ryzen 3800X. The output was:
Processors: 16
Total time: 209.22
The speed reported in the Task Manager while it ran was ~4.12 GHz.
I ran it on a machine with an i7-7820HK The output was:
Processors: 8
Total time: 213.09
The speed reported in the Task Manager while it ran was ~3.45 GHz.
I modified Parallel.For to include the processor count (Parallel.For(0, length, new ParallelOptions() { MaxDegreeOfParallelism = Environment.ProcessorCount }, (i) => {code});). The outputs were:
3800X: 16 - 158.58 # ~4.13
7820HK: 8 - 210.49 # ~3.40
There's something to be said about Parallel.For not natively identifying the Ryzen processors vs cores, but setting that aside, even here the Ryzen performance is still significantly poorer than would be expected (only ~25% faster with double the cores/processors, a faster speed, and larger L1-3 caches). Can anyone explain why?
Edit: Following a couple of comments, I made some changes to my code. See below:
static int length = 1000;
static void Main(string[] args)
{
StateCount = new ConcurrentDictionary<int, int>();
for (int i = 0; i < 3; i++)
{
StateCount.AddOrUpdate(i, 0, (k, v) => 0);
}
var procCount = Environment.ProcessorCount;
Console.WriteLine("Processors: " + procCount);
Console.WriteLine("Starting...");
Console.WriteLine();
List<double> times = new List<double>();
Stopwatch sw = new Stopwatch();
for (int m = 0; m < 10; m++)
{
sw.Restart();
Parallel.For(0, length, new ParallelOptions() { MaxDegreeOfParallelism = procCount }, (i) =>
{
for (int j = 0; j < 1000000; j++)
{
var rand = GetRandomNumber();
int newState = 0;
if (rand < 0.3)
{
newState = 0;
}
else if (rand < 0.6)
{
newState = 1;
}
else
{
newState = 2;
}
StateCount.AddOrUpdate(newState, 0, (k, v) => v + 1);
}
});
sw.Stop();
Console.WriteLine("Total time: " + sw.Elapsed.TotalSeconds);
times.Add(sw.Elapsed.TotalSeconds);
}
Console.WriteLine();
var avg = times.Average();
var variance = times.Select(x => (x - avg) * (x - avg)).Sum() / times.Count;
var stdev = Math.Sqrt(variance);
Console.WriteLine("Average time: " + avg + " +/- " + stdev);
Console.ReadKey();
Console.ReadKey();
}
The outside loop is 1,000 instead of 1,000,000,000, so there are "only" 1,000 parallel "tasks." Within each parallel "task" however there's now a loop of 1,000,000 actions, so the act of "getting the task" or whatever should have a much smaller affect on the total. I also loop the whole thing 10 times and get the average + standard devation. Output:
Ryzen 3800X: 158.531 +/- 0.429 # ~4.13
i7-7820HK: 202.159 +/- 2.538 # ~3.48
Even here, the Ryzen's twice as many threads and 0.60 GHz higher clock only result in a ~75% faster time for the total operation.

Relative speed of jagged, multidimensional arrays and object fields

Some background, im an engineer not a professional programmer so apologies for any dumb questions. I have a program that is calculation intensive and i want to speed it up. My understanding (from reading forums) was that C# is optimised for Jagged Arrays, but my test below seems to say not. If i'm doing something wrong please let me know, also the test which uses the tray object (could be any moderately complex object) seems too fast, is there a problem with the code?
The relative speeds im getting (in debug) are; (Note: I just realised the times include array initislisation, but thats ok, the arrays will be cleared or re-initialised many times in the program).
Object/field access 0ms
Jagged Array access 87ms
Normal Array access 12ms
If i dont count initialising the arrays i get 0, 7 and 7 ms respectively.
var watch = Stopwatch.StartNew();
double res=0;
int count = 10000;
Tray tray = column[0][4];
for (int i = 0; i < count; i++)
{
tray = column[0][4];
tray.T = i;
res = tray.T;
}
watch.Stop();
var elapsedMs = watch.ElapsedMilliseconds;
Debug.WriteLine("Column Solution Time1 ms: " + elapsedMs.ToString() + " " + res.ToString());
MessageBox.Show("Object Field" + elapsedMs.ToString());
watch = Stopwatch.StartNew();
double res2 = 0;
int count2 = count;
double[][] tray1 = new double[count2][];
for (int i = 0; i < count2; i++)
tray1 = new double[count2];
for (int i = 0; i < count2; i++)
{
tray1[i][i] = i;
res2 = tray1[i][i];
}
watch.Stop();
elapsedMs = watch.ElapsedMilliseconds;
Debug.WriteLine("Column Solution Time2 ms: " + elapsedMs.ToString() +" "+ res2.ToString());
MessageBox.Show("Jagged Matix" + elapsedMs.ToString());
watch = Stopwatch.StartNew();
double res3 = 0;
int count3 = count;
double[,] tray3 = new double[count3,count3];
for (int i = 0; i < count3; i++)
{
tray3[i,i] = i;
res3 = tray3[i,i];
}
watch.Stop();
elapsedMs = watch.ElapsedMilliseconds;
Debug.WriteLine("Column Solution Time3 ms: " + elapsedMs.ToString() + " " + res3.ToString());
MessageBox.Show("Matix" + elapsedMs.ToString());

Program stuck when using parallel for

I'm fairly new to C# and I made a simple program simmulating lotto draws. It takes first (random) numbers and calculates how many draws it takes to win. It's a polish lotto, so there are 6 numbers to match.
Everything works fine, when program is run in simple for loop. But there is a problem, when I use Parallel For or whatever other multitasking or multithreding option.
First the code:
class Program
{
public static int howMany = 100;
static void Main(string[] args)
{
Six my;
Six computers;
long sum = 0;
double avg = 0;
int min = 1000000000;
int max = 0;
for (int i = 0; i < howMany; i++)
{
my = new Six();
Console.WriteLine((i + 1).ToString() + " My: " + my.ToString());
int counter = 0;
do
{
computers = new Six();
counter++;
} while (!my.Equals(computers));
Console.WriteLine((i + 1).ToString() + " Computers: " + computers.ToString());
Console.WriteLine(counter.ToString("After: ### ### ###") + "\n");
if (counter < min)
min = counter;
if (counter > max)
max = counter;
sum += counter;
}
avg = sum / howMany;
Console.WriteLine("Average: " + avg);
Console.WriteLine("Sum: " + sum);
Console.WriteLine("Min: " + min);
Console.WriteLine("Max: " + max);
Console.Read();
}
}
class Six : IEquatable<Six>
{
internal byte first;
internal byte second;
internal byte third;
internal byte fourth;
internal byte fifth;
internal byte sixth;
private static Random r = new Random();
public Six()
{
GenerateRandomNumbers();
}
public bool Equals(Six other)
{
if (this.first == other.first
&& this.second == other.second
&& this.third == other.third
&& this.fourth == other.fourth
&& this.fifth == other.fifth
&& this.sixth == other.sixth)
return true;
else
return false;
}
private void GenerateRandomNumbers()
{
byte[] numbers = new byte[6];
byte k = 0;
for (int i = 0; i < 6; i++)
{
do
{
k = (byte)(r.Next(49) + 1);
}while (numbers.Contains(k));
numbers[i] = k;
k = 0;
}
Array.Sort(numbers);
this.first = numbers[0];
this.second = numbers[1];
this.third = numbers[2];
this.fourth = numbers[3];
this.fifth = numbers[4];
this.sixth = numbers[5];
}
public override string ToString()
{
return this.first + ", " + this.second + ", " + this.third + ", " + this.fourth + ", " + this.fifth + ", " + this.sixth;
}
}
And when I try to make it Parallel.For:
long sum = 0;
double avg = 0;
int min = 1000000000;
int max = 0;
Parallel.For(0, howMany, (i) =>
{
Six my = new Six();
Six computers;
Console.WriteLine((i + 1).ToString() + " My: " + my.ToString());
int counter = 0;
do
{
computers = new Six();
// Checking when it's getting stuck
if (counter % 100 == 0)
Console.WriteLine(counter);
counter++;
} while (!my.Equals(computers));
Console.WriteLine((i + 1).ToString() + " Computers: " + computers.ToString());
Console.WriteLine(counter.ToString("After: ### ### ###") + "\n");
// It never get to this point, so there is no problem with "global" veriables
if (counter < min)
min = counter;
if (counter > max)
max = counter;
sum += counter;
});
Program gets stuck at some point. Counters get to ~3,000-40,000 and refuses to go further.
What I tried:
Making class a struct
Collecting Garbage every ~1000 iterations
Using ThreadPool
Using Task.Run
Making random class Program member only (tired to make Six class "lighter")
But I got nothing.
I know that this might be a very simple thing for some of you, but man got to learn somehow ;) I even bought a book about async programming to find out why doesn't it work, but couldn't figure it out.
Random isn't thread safe...
Wait for your code to stop writing new lines in the parallel version and the pause. This stops all threads. You'll notice that all your parallel threads are in the while loop.
The numbers array's are all 1,0,0,0,0,0 and r.Next only returns 1. which the byte array always contains. So, you broke Random
To fix this you'll need to make r thread safe, either by locking r every time you access r.Next or changing the static declaration to
private static readonly ThreadLocal<Random> r
= new ThreadLocal<Random>(() => new Random());
and the Next call becomes
k = (byte)(r.Value.Next(49) + 1);
This will create a new static Random instance per thread.
As you noted, creating lots of Random's at the same time result the in same sequence of numbers being produced, to get around this add a seed class
static class RGen
{
private static Random seedGen = new Random();
public static Random GetRGenerator()
{
lock (seedGen)
{
return new Random(seedGen.Next());
}
}
}
and change the declaration to
private static readonly ThreadLocal<Random> r
= new ThreadLocal<Random>(() => RGen.GetRGenerator());
This will ensure each new random instance has a different seed value.
I found a solution based on what James Barrass wrote:
public static readonly ThreadLocal<Random> r = new ThreadLocal<Random>(() => new Random(Thread.CurrentThread.ManagedThreadId + DateTime.Now.Miliseconds));
That made program running well :)

Count the Ones from 1 to 99,999,999

I am seeking an algorithm that can simply count how many 1's are there from 1 to 99,999,999 as fast as possible
There are many ways but I need the fastest.
Actually the right number is 80000000
My try in c#:
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
int counter = 0;
sw.Start();
for (int i = 1; i <= 99999999; i++)
{
int num = i;
while (num > 0)
{
if (num % 10==1)
{
counter++;
}
num = num / 10;
}
}
sw.Stop();
Console.WriteLine("1 is counted {0} times and prog time is {1}", counter, sw.Elapsed.ToString());
i don't know the math formula to get it without any loop as i think it will be the fastest way !!
actually i managed to do it without any loops so that's considered the fastest way and i am gonna share it for knowlagde
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
int counter = 0;
sw.Reset();
sw.Start();
int n=8;
counter =(int) (n * Math.Pow(10, n - 1));
sw.Stop();
Console.WriteLine("1 is counted {0} times and prog time is {1}", counter, sw.Elapsed.ToString());

Performance when Generating CPU Cache Misses

I am trying to learn about CPU cache performance in the world of .NET. Specifically I am working through Igor Ostovsky's article about Processor Cache Effects.
I have gone through the first three examples in his article and have recorded results that widely differ from his. I think I must be doing something wrong because the performance on my machine is showing almost the exact opposite results of what he shows in his article. I am not seeing the large effects from cache misses that I would expect.
What am I doing wrong? (bad code, compiler setting, etc.)
Here are the performance results on my machine:
If it helps, the processor on my machine is an Intel Core i7-2630QM. Here is info on my processor's cache:
I have compiled in x64 Release mode.
Below is my source code:
class Program
{
static Stopwatch watch = new Stopwatch();
static int[] arr = new int[64 * 1024 * 1024];
static void Main(string[] args)
{
Example1();
Example2();
Example3();
Console.ReadLine();
}
static void Example1()
{
Console.WriteLine("Example 1:");
// Loop 1
watch.Restart();
for (int i = 0; i < arr.Length; i++) arr[i] *= 3;
watch.Stop();
Console.WriteLine(" Loop 1: " + watch.ElapsedMilliseconds.ToString() + " ms");
// Loop 2
watch.Restart();
for (int i = 0; i < arr.Length; i += 32) arr[i] *= 3;
watch.Stop();
Console.WriteLine(" Loop 2: " + watch.ElapsedMilliseconds.ToString() + " ms");
Console.WriteLine();
}
static void Example2()
{
Console.WriteLine("Example 2:");
for (int k = 1; k <= 1024; k *= 2)
{
watch.Restart();
for (int i = 0; i < arr.Length; i += k) arr[i] *= 3;
watch.Stop();
Console.WriteLine(" K = "+ k + ": " + watch.ElapsedMilliseconds.ToString() + " ms");
}
Console.WriteLine();
}
static void Example3()
{
Console.WriteLine("Example 3:");
for (int k = 1; k <= 1024*1024; k *= 2)
{
//256* 4bytes per 32 bit int * k = k Kilobytes
arr = new int[256*k];
int steps = 64 * 1024 * 1024; // Arbitrary number of steps
int lengthMod = arr.Length - 1;
watch.Restart();
for (int i = 0; i < steps; i++)
{
arr[(i * 16) & lengthMod]++; // (x & lengthMod) is equal to (x % arr.Length)
}
watch.Stop();
Console.WriteLine(" Array size = " + arr.Length * 4 + " bytes: " + (int)(watch.Elapsed.TotalMilliseconds * 1000000.0 / arr.Length) + " nanoseconds per element");
}
Console.WriteLine();
}
}
Why are you using i += 32 in the second loop. You are stepping over cache lines in this way. 32*4 = 128bytes way bigger then 64bytes needed.

Categories