Parallel.For performance - c#

This code is from Microsoft article http://msdn.microsoft.com/en-us/library/dd460703.aspx, with small changes:
const int size = 10000000;
int[] nums = new int[size];
Parallel.For(0, size, i => {nums[i] = 1;});
long total = 0;
Parallel.For<long>(
0, size, () => 0,
(j, loop, subtotal) =>
{
return subtotal + nums[j];
},
(x) => Interlocked.Add(ref total, x)
);
if (total != size)
{
Console.WriteLine("Error");
}
Non-parallel loop version is:
for (int i = 0; i < size; ++i)
{
total += nums[i];
}
When I measure loop execution time using StopWatch class, I see that parallel version is slower by 10-20%. Testing is done on Windows 7 64 bit, Intel i5-2400 CPU, 4 cores, 4 GB RAM. Of course, in Release configuration.
In my real program I am trying to compute an image histogram, and parallel version runs 10 times slower. Can such kind of computation tasks, when every loop invocation is very fast, be successfully parallelized with TPL?
Edit.
Finally I managed to shave more that 50% of histogram calculation execution time with Parallel.For, when divided the whole image to some number of chunks. Every loop body invocation now handles the whole chunk, and not one pixel.

Because Parallel.For should be used for things that are a little heacy, not to sum simple numbers! Just the use of the delegate (j, loop, subtotal) => is probably more than enough to give 10-20% more time. And we aren't even speaking of the threading overhead. It would be interesting to see some benchmark against a delegate summer in the for cycle and to see not only the "real world" time, but the CPU time.
I have even added a comparison to a "simple" delegate that does the same thing as the Parallel.For<> delegate.
Mmmh... Now I have some numbers at 32 bits, on my PC (an AMD six core)
32 bits
Parallel: Ticks: 74581, Total ProcessTime: 2496016
Base : Ticks: 90395, Total ProcessTime: 312002
Func : Ticks: 147037, Total ProcessTime: 468003
The Parallel is a little faster at wall time, but 8x slower at processor time :-)
But at 64 bits:
64 bits
Parallel: Ticks: 104326, Total ProcessTime: 2652017
Base : Ticks: 51664, Total ProcessTime: 156001
Func : Ticks: 77861, Total ProcessTime: 312002
Modified code:
Console.WriteLine("{0} bits", IntPtr.Size == 4 ? 32 : 64);
var cp = Process.GetCurrentProcess();
cp.PriorityClass = ProcessPriorityClass.High;
const int size = 10000000;
int[] nums = new int[size];
Parallel.For(0, size, i => { nums[i] = 1; });
GC.Collect();
GC.WaitForPendingFinalizers();
long total = 0;
{
TimeSpan start = cp.TotalProcessorTime;
Stopwatch sw = Stopwatch.StartNew();
Parallel.For<long>(
0, size, () => 0,
(j, loop, subtotal) =>
{
return subtotal + nums[j];
},
(x) => Interlocked.Add(ref total, x)
);
sw.Stop();
TimeSpan end = cp.TotalProcessorTime;
Console.WriteLine("Parallel: Ticks: {0,10}, Total ProcessTime: {1,10}", sw.ElapsedTicks, (end - start).Ticks);
}
if (total != size)
{
Console.WriteLine("Error");
}
GC.Collect();
GC.WaitForPendingFinalizers();
total = 0;
{
TimeSpan start = cp.TotalProcessorTime;
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < size; ++i)
{
total += nums[i];
}
sw.Stop();
TimeSpan end = cp.TotalProcessorTime;
Console.WriteLine("Base : Ticks: {0,10}, Total ProcessTime: {1,10}", sw.ElapsedTicks, (end - start).Ticks);
}
if (total != size)
{
Console.WriteLine("Error");
}
GC.Collect();
GC.WaitForPendingFinalizers();
total = 0;
Func<int, int, long, long> adder = (j, loop, subtotal) =>
{
return subtotal + nums[j];
};
{
TimeSpan start = cp.TotalProcessorTime;
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < size; ++i)
{
total = adder(i, 0, total);
}
sw.Stop();
TimeSpan end = cp.TotalProcessorTime;
Console.WriteLine("Func : Ticks: {0,10}, Total ProcessTime: {1,10}", sw.ElapsedTicks, (end - start).Ticks);
}
if (total != size)
{
Console.WriteLine("Error");
}

Related

Trying to find large prime numbers with Alea GPU

An exception occurs when I try to find the 100,000th prime number using Alea GPU. The algorithm works fine if I try to find a smaller prime number e.g. the 10,000th prime number.
I am using Alea v3.0.4, NVIDIA GTX 970, Cuda 9.2 drivers.
I am new to GPU programming. Any help would be greatly appreciated.
long[] primeNumber = new long[1]; // nth prime number to find
int n = 100000; // find the 100,000th prime number
var worker = Gpu.Default; // GTX 970 CUDA v9.2 drivers
long count = 0;
worker.LongFor(count, n, x =>
{
long a = 2;
while (count < n)
{
long b = 2;
long prime = 1;
while (b * b <= a)
{
if (a % b == 0)
{
prime = 0;
break;
}
b++;
}
if (prime > 0)
{
count++;
}
a++;
}
primeNumber[0] = (a - 1);
}
);
Here are the exception details:
System.Exception occurred HResult=0x80131500 Message=[CUDAError]
CUDA_ERROR_LAUNCH_FAILED Source=Alea StackTrace: at
Alea.CUDAInterop.cuSafeCall#2939.Invoke(String message) at
Alea.CUDAInterop.cuSafeCall(cudaError_enum result) at
A.cf5aded17df9f7cc4c132234dda010fa7.Copy#918-22.Invoke(Unit _arg9)
at Alea.Memory.Copy(FSharpOption1 streamOpt, Memory src, IntPtr
srcOffset, Memory dst, IntPtr dstOffset, FSharpOption1 lengthOpt)
at
Alea.ImplicitMemoryTrackerEntry.cdd2cd00c052408bcdbf03958f14266ca(FSharpFunc2
c600c458623dca7db199a0e417603dff4, Object
cd5116337150ebaa6de788dacd82516fa) at
Alea.ImplicitMemoryTrackerEntry.c6a75c171c9cccafb084beba315394985(FSharpFunc2
c600c458623dca7db199a0e417603dff4, Object
cd5116337150ebaa6de788dacd82516fa) at
Alea.ImplicitMemoryTracker.HostReadWriteBarrier(Object instance) at
Alea.GlobalImplicitMemoryTracker.HostReadWriteBarrier(Object instance)
at A.cf5aded17df9f7cc4c132234dda010fa7.clo#2359-624.Invoke(Object
arg00) at
Microsoft.FSharp.Collections.SeqModule.Iterate[T](FSharpFunc2 action,
IEnumerable1 source) at Alea.Kernel.LaunchRaw(LaunchParam lp,
FSharpOption1 instanceOpt, FSharpList1 args) at
Alea.Parallel.Device.DeviceFor.For(Gpu gpu, Int64 fromInclusive, Int64
toExclusive, Action1 op) at Alea.Parallel.GpuExtension.LongFor(Gpu
gpu, Int64 fromInclusive, Int64 toExclusive, Action1 op) at
TestingGPU.Program.Execute(Int32 t) in
C:\Users..\source\repos\TestingGPU\TestingGPU\Program.cs:line 148
at TestingGPU.Program.Main(String[] args)
Working Solution:
static void Main(string[] args)
{
var devices = Device.Devices;
foreach (var device in devices)
{
Console.WriteLine(device.ToString());
}
while (true)
{
Console.WriteLine("Enter a number to check if it is a prime number:");
string line = Console.ReadLine();
long checkIfPrime = Convert.ToInt64(line);
Stopwatch sw = new Stopwatch();
sw.Start();
bool GPUisPrime = GPUIsItPrime(checkIfPrime+1);
sw.Stop();
Stopwatch sw2 = new Stopwatch();
sw2.Start();
bool CPUisPrime = CPUIsItPrime(checkIfPrime+1);
sw2.Stop();
Console.WriteLine($"GPU: is {checkIfPrime} prime? {GPUisPrime} Time Elapsed: {sw.ElapsedMilliseconds.ToString()}");
Console.WriteLine($"CPU: is {checkIfPrime} prime? {CPUisPrime} Time Elapsed: {sw2.ElapsedMilliseconds.ToString()}");
}
}
[GpuManaged]
private static bool GPUIsItPrime(long n)
{
//Sieve of Eratosthenes Algorithm
bool[] isComposite = new bool[n];
var worker = Gpu.Default;
worker.LongFor(2, n, i =>
{
if (!(isComposite[i]))
{
for (long j = 2; (j * i) < isComposite.Length; j++)
{
isComposite[j * i] = true;
}
}
});
return !isComposite[n-1];
}
private static bool CPUIsItPrime(long n)
{
//Sieve of Eratosthenes Algorithm
bool[] isComposite = new bool[n];
for (int i = 2; i < n; i++)
{
if (!isComposite[i])
{
for (long j = 2; (j * i) < n; j++)
{
isComposite[j * i] = true;
}
}
}
return !isComposite[n-1];
}
Your code doesn't look right. Given a parallel for-loop method here (LongFor), Alea will spawn "n" threads, with an index "x" used to identify what the thread number is. So, for example a simple example like For(0, n, x => a[x] = x); uses "x" to initialize a[] with { 0, 1, 2, ...., n - 1}. But, your kernel code does not use "x" anywhere in the code. Consequently, you run the same code "n" times with absolutely no difference. Why then run on a GPU? What I think you want is to do is to compute in thread "x" whether "x" is prime. With result in hand, set bool prime[x] = true or false. Then, afterwards, in the kernel after all that, add a sync call, followed with a test using a single thread (e.g., x == 0) to go through prime[] and pick the largest prime from the array. Otherwise, there's a lot of collisions for 'primeNumber[0] = (a - 1);' by n-threads on the GPU. I can't imagine how you would ever get the right result. Finally, you probably want to make sure using some Alea call that prime[] is never copied to/from the GPU. But, I don't know how you do that in Alea. The compiler might be smart enough to know that prime[] is only used in the kernel code.

Why is my ElapsedMilliseconds always zero here?

So I'm trying to measure the performance of the hash set I created versus the performance of the same elements in a List and in the following block of code
Stopwatch Watch = new Stopwatch();
long tList = 0, tHset = 0; // ms
foreach ( string Str in Copy )
{
// measure time to look up string in ordinary list
Watch.Start();
if ( ListVersion.Contains(Str) ) { }
Watch.Stop();
tList += Watch.ElapsedMilliseconds;
// now measure time to look up same string in my hash set
Watch.Reset();
Watch.Start();
if ( this.Contains(Str) ) { }
Watch.Stop();
tHset += Watch.ElapsedMilliseconds;
Watch.Reset();
}
int n = Copy.Count;
Console.WriteLine("Average milliseconds to look up in List: {0}", tList / n);
Console.WriteLine("Average milliseconds to look up in hashset: {0}", tHset / n);
it is outputing 0 for both. Any idea why this is? Relevant documentation: https://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch(v=vs.110).aspx
That's because the operation is faster than precision of the Stapwatch.
Instead of measuring each of the Contains call separately measure a group of them:
Stopwatch Watch = new Stopwatch();
long tList = 0, tHset = 0; // ms
// measure time to look up string in ordinary list
Watch.Start();
foreach ( string Str in Copy )
{
if ( ListVersion.Contains(Str) ) { }
}
Watch.Stop();
tList = Watch.ElapsedMilliseconds;
// now measure time to look up same string in my hash set
Watch.Reset();
Watch.Start();
foreach ( string Str in Copy )
{
if ( this.Contains(Str) ) { }
}
Watch.Stop();
tHset = Watch.ElapsedMilliseconds;
Console.WriteLine("Total milliseconds to look up in List: {0}", tList);
Console.WriteLine("Total milliseconds to look up in hashset: {0}", tHset);
As you can see, I also changed the code to print total time spent instead of average. With operations so fast performance is usually presented in Xs per Y operations instead of average. E.g. 40ms per 10 million lookups.
Also, it's possible that in Release mode parts of your code will be optimized away, because it doesn't actually do anything. Consider counting number of elements for which Contains returns true and printing that number out at the end.
You can keep your code as it is and instead of doing :
Watch.ElapsedMilliseconds
You do this :
Watch.Elapsed.TotalMilliseconds
This way you will have the fractional part of the millisecond

How to execute a method with parameters as different threads independently in C#? [duplicate]

I have a question concerning parallel for loops. I have the following code:
public static void MultiplicateArray(double[] array, double factor)
{
for (int i = 0; i < array.Length; i++)
{
array[i] = array[i] * factor;
}
}
public static void MultiplicateArray(double[] arrayToChange, double[] multiplication)
{
for (int i = 0; i < arrayToChange.Length; i++)
{
arrayToChange[i] = arrayToChange[i] * multiplication[i];
}
}
public static void MultiplicateArray(double[] arrayToChange, double[,] multiArray, int dimension)
{
for (int i = 0; i < arrayToChange.Length; i++)
{
arrayToChange[i] = arrayToChange[i] * multiArray[i, dimension];
}
}
Now I try to add parallel function:
public static void MultiplicateArray(double[] array, double factor)
{
Parallel.For(0, array.Length, i =>
{
array[i] = array[i] * factor;
});
}
public static void MultiplicateArray(double[] arrayToChange, double[] multiplication)
{
Parallel.For(0, arrayToChange.Length, i =>
{
arrayToChange[i] = arrayToChange[i] * multiplication[i];
});
}
public static void MultiplicateArray(double[] arrayToChange, double[,] multiArray, int dimension)
{
Parallel.For(0, arrayToChange.Length, i =>
{
arrayToChange[i] = arrayToChange[i] * multiArray[i, dimension];
});
}
The issue is, that I want to save time, not to waste it. With the standard for loop it computes about 2 minutes, but with the parallel for loop it takes 3 min. Why?
Parallel.For() can improve performance a lot by parallelizing your code, but it also has overhead (synchronization between threads, invoking the delegate on each iteration). And since in your code, each iteration is very short (basically, just a few CPU instructions), this overhead can become prominent.
Because of this, I thought using Parallel.For() is not the right solution for you. Instead, if you parallelize your code manually (which is very simple in this case), you may see the performance improve.
To verify this, I performed some measurements: I ran different implementations of MultiplicateArray() on an array of 200 000 000 items (the code I used is below). On my machine, the serial version consistently took 0.21 s and Parallel.For() usually took something around 0.45 s, but from time to time, it spiked to 8–9 s!
First, I'll try to improve the common case and I'll come to those spikes later. We want to process the array by N CPUs, so we split it into N equally sized parts and process each part separately. The result? 0.35 s. That's still worse than the serial version. But for loop over each item in an array is one of the most optimized constructs. Can't we do something to help the compiler? Extracting computing the bound of the loop could help. It turns out it does: 0.18 s. That's better than the serial version, but not by much. And, interestingly, changing the degree of parallelism from 4 to 2 on my 4-core machine (no HyperThreading) doesn't change the result: still 0.18 s. This makes me conclude that the CPU is not the bottleneck here, memory bandwidth is.
Now, back to the spikes: my custom parallelization doesn't have them, but Parallel.For() does, why? Parallel.For() does use range partitioning, which means each thread processes its own part of the array. But, if one thread finishes early, it will try to help processing the range of another thread that hasn't finished yet. If that happens, you will get a lot of false sharing, which could slow down the code a lot. And my own test with forcing false sharing seems to indicate this could indeed be the problem. Forcing the degree of parallelism of the Parallel.For() seems to help with the spikes a little.
Of course, all those measurements are specific to the hardware on my computer and will be different for you, so you should make your own measurements.
The code I used:
static void Main()
{
double[] array = new double[200 * 1000 * 1000];
for (int i = 0; i < array.Length; i++)
array[i] = 1;
for (int i = 0; i < 5; i++)
{
Stopwatch sw = Stopwatch.StartNew();
Serial(array, 2);
Console.WriteLine("Serial: {0:f2} s", sw.Elapsed.TotalSeconds);
sw = Stopwatch.StartNew();
ParallelFor(array, 2);
Console.WriteLine("Parallel.For: {0:f2} s", sw.Elapsed.TotalSeconds);
sw = Stopwatch.StartNew();
ParallelForDegreeOfParallelism(array, 2);
Console.WriteLine("Parallel.For (degree of parallelism): {0:f2} s", sw.Elapsed.TotalSeconds);
sw = Stopwatch.StartNew();
CustomParallel(array, 2);
Console.WriteLine("Custom parallel: {0:f2} s", sw.Elapsed.TotalSeconds);
sw = Stopwatch.StartNew();
CustomParallelExtractedMax(array, 2);
Console.WriteLine("Custom parallel (extracted max): {0:f2} s", sw.Elapsed.TotalSeconds);
sw = Stopwatch.StartNew();
CustomParallelExtractedMaxHalfParallelism(array, 2);
Console.WriteLine("Custom parallel (extracted max, half parallelism): {0:f2} s", sw.Elapsed.TotalSeconds);
sw = Stopwatch.StartNew();
CustomParallelFalseSharing(array, 2);
Console.WriteLine("Custom parallel (false sharing): {0:f2} s", sw.Elapsed.TotalSeconds);
}
}
static void Serial(double[] array, double factor)
{
for (int i = 0; i < array.Length; i++)
{
array[i] = array[i] * factor;
}
}
static void ParallelFor(double[] array, double factor)
{
Parallel.For(
0, array.Length, i => { array[i] = array[i] * factor; });
}
static void ParallelForDegreeOfParallelism(double[] array, double factor)
{
Parallel.For(
0, array.Length, new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
i => { array[i] = array[i] * factor; });
}
static void CustomParallel(double[] array, double factor)
{
var degreeOfParallelism = Environment.ProcessorCount;
var tasks = new Task[degreeOfParallelism];
for (int taskNumber = 0; taskNumber < degreeOfParallelism; taskNumber++)
{
// capturing taskNumber in lambda wouldn't work correctly
int taskNumberCopy = taskNumber;
tasks[taskNumber] = Task.Factory.StartNew(
() =>
{
for (int i = array.Length * taskNumberCopy / degreeOfParallelism;
i < array.Length * (taskNumberCopy + 1) / degreeOfParallelism;
i++)
{
array[i] = array[i] * factor;
}
});
}
Task.WaitAll(tasks);
}
static void CustomParallelExtractedMax(double[] array, double factor)
{
var degreeOfParallelism = Environment.ProcessorCount;
var tasks = new Task[degreeOfParallelism];
for (int taskNumber = 0; taskNumber < degreeOfParallelism; taskNumber++)
{
// capturing taskNumber in lambda wouldn't work correctly
int taskNumberCopy = taskNumber;
tasks[taskNumber] = Task.Factory.StartNew(
() =>
{
var max = array.Length * (taskNumberCopy + 1) / degreeOfParallelism;
for (int i = array.Length * taskNumberCopy / degreeOfParallelism;
i < max;
i++)
{
array[i] = array[i] * factor;
}
});
}
Task.WaitAll(tasks);
}
static void CustomParallelExtractedMaxHalfParallelism(double[] array, double factor)
{
var degreeOfParallelism = Environment.ProcessorCount / 2;
var tasks = new Task[degreeOfParallelism];
for (int taskNumber = 0; taskNumber < degreeOfParallelism; taskNumber++)
{
// capturing taskNumber in lambda wouldn't work correctly
int taskNumberCopy = taskNumber;
tasks[taskNumber] = Task.Factory.StartNew(
() =>
{
var max = array.Length * (taskNumberCopy + 1) / degreeOfParallelism;
for (int i = array.Length * taskNumberCopy / degreeOfParallelism;
i < max;
i++)
{
array[i] = array[i] * factor;
}
});
}
Task.WaitAll(tasks);
}
static void CustomParallelFalseSharing(double[] array, double factor)
{
var degreeOfParallelism = Environment.ProcessorCount;
var tasks = new Task[degreeOfParallelism];
int i = -1;
for (int taskNumber = 0; taskNumber < degreeOfParallelism; taskNumber++)
{
tasks[taskNumber] = Task.Factory.StartNew(
() =>
{
int j = Interlocked.Increment(ref i);
while (j < array.Length)
{
array[j] = array[j] * factor;
j = Interlocked.Increment(ref i);
}
});
}
Task.WaitAll(tasks);
}
Example output:
Serial: 0,20 s
Parallel.For: 0,50 s
Parallel.For (degree of parallelism): 8,90 s
Custom parallel: 0,33 s
Custom parallel (extracted max): 0,18 s
Custom parallel (extracted max, half parallelism): 0,18 s
Custom parallel (false sharing): 7,53 s
Serial: 0,21 s
Parallel.For: 0,52 s
Parallel.For (degree of parallelism): 0,36 s
Custom parallel: 0,31 s
Custom parallel (extracted max): 0,18 s
Custom parallel (extracted max, half parallelism): 0,19 s
Custom parallel (false sharing): 7,59 s
Serial: 0,21 s
Parallel.For: 11,21 s
Parallel.For (degree of parallelism): 0,36 s
Custom parallel: 0,32 s
Custom parallel (extracted max): 0,18 s
Custom parallel (extracted max, half parallelism): 0,18 s
Custom parallel (false sharing): 7,76 s
Serial: 0,21 s
Parallel.For: 0,46 s
Parallel.For (degree of parallelism): 0,35 s
Custom parallel: 0,31 s
Custom parallel (extracted max): 0,18 s
Custom parallel (extracted max, half parallelism): 0,18 s
Custom parallel (false sharing): 7,58 s
Serial: 0,21 s
Parallel.For: 0,45 s
Parallel.For (degree of parallelism): 0,40 s
Custom parallel: 0,38 s
Custom parallel (extracted max): 0,18 s
Custom parallel (extracted max, half parallelism): 0,18 s
Custom parallel (false sharing): 7,58 s
Svick already provided a great answer but I'd like to emphasize that the key point is not to "parallelize your code manually" instead of using Parallel.For() but that you have to process larger chunks of data.
This can still be done using Parallel.For() like this:
static void My(double[] array, double factor)
{
int degreeOfParallelism = Environment.ProcessorCount;
Parallel.For(0, degreeOfParallelism, workerId =>
{
var max = array.Length * (workerId + 1) / degreeOfParallelism;
for (int i = array.Length * workerId / degreeOfParallelism; i < max; i++)
array[i] = array[i] * factor;
});
}
which does the same thing as svicks CustomParallelExtractedMax() but is shorter, simpler and (on my machine) performs even slightly faster:
Serial: 3,94 s
Parallel.For: 9,28 s
Parallel.For (degree of parallelism): 9,58 s
Custom parallel: 2,05 s
Custom parallel (extracted max): 1,19 s
Custom parallel (extracted max, half parallelism): 1,49 s
Custom parallel (false sharing): 27,88 s
My: 0,95 s
Btw, the keyword for this which is missing from all the other answers is granularity.
See Custom Partitioners for PLINQ and TPL:
In a For loop, the body of the loop is provided to the method as a delegate. The cost of invoking that delegate is about the same as a virtual method call. In some scenarios, the body of a parallel loop might be small enough that the cost of the delegate invocation on each loop iteration becomes significant. In such situations, you can use one of the Create overloads to create an IEnumerable<T> of range partitions over the source elements. Then, you can pass this collection of ranges to a ForEach method whose body consists of a regular for loop. The benefit of this approach is that the delegate invocation cost is incurred only once per range, rather than once per element.
In your loop body, you are performing a single multiplication, and the overhead of the delegate call will be very noticeable.
Try this:
public static void MultiplicateArray(double[] array, double factor)
{
var rangePartitioner = Partitioner.Create(0, array.Length);
Parallel.ForEach(rangePartitioner, range =>
{
for (int i = range.Item1; i < range.Item2; i++)
{
array[i] = array[i] * factor;
}
});
}
See also: Parallel.ForEach documentation and Partitioner.Create documentation.
Parallel.For involves more complex memory management. That result could vary depending on cpu specs, like #cores, L1 & L2 cache...
Please take a look to this interesting article:
http://msdn.microsoft.com/en-us/magazine/cc872851.aspx
from http://msdn.microsoft.com/en-us/library/system.threading.tasks.parallel.aspx and http://msdn.microsoft.com/en-us/library/dd537608.aspx
you are not creating three thread/process that execute your for, but the iteration of the for is tryed to be executet in parallel, so even with only one for you are using multiple thread/process.
this mean that interation with index = 0 and index = 1 may be executed at the same time.
Probabily you are forcing to use too much thread/process, and the overhead for the creation/execution of them is bigger that the speed gain.
Try to use three normal for but in three different thread/process, if your sistem is multicore (3x at least) it should take less than one minute

Environment.TickCount VS StopWatch [duplicate]

Is it ever OK to use Environment.TickCountto calculate time spans?
int start = Environment.TickCount;
// Do stuff
int duration = Environment.TickCount - start;
Console.WriteLine("That took " + duration " ms");
Because TickCount is signed and will rollover after 25 days (it takes 50 days to hit all 32 bits, but you have to scrap the signed bit if you want to make any sense of the math), it seems like it's too risky to be useful.
I'm using DateTime.Now instead. Is this the best way to do this?
DateTime start = DateTime.Now;
// Do stuff
TimeSpan duration = DateTime.Now - start;
Console.WriteLine("That took " + duration.TotalMilliseconds + " ms");
Environment.TickCount is based on GetTickCount() WinAPI function. It's in milliseconds
But the actual precision of it is about 15.6 ms. So you can't measure shorter time intervals (or you'll get 0)
Note: The returned value is Int32, so this counter rolls over each ~49.7 days. You shouldn't use it to measure such long intervals.
DateTime.Ticks is based on GetSystemTimeAsFileTime() WinAPI function. It's in 100s nanoseconds (tenths of microsoconds).
The actual precision of DateTime.Ticks depends on the system. On XP, the increment of system clock is about 15.6 ms, the same as in Environment.TickCount.
On Windows 7 its precision is 1 ms (while Environemnt.TickCount's is still 15.6 ms), however if a power saving scheme is used (usually on laptops) it can go down to 15.6 ms as well.
Stopwatch is based on QueryPerformanceCounter() WinAPI function (but if high-resolution performance counter is not supported by your system, DateTime.Ticks is used)
Before using StopWatch notice two problems:
it can be unreliable on multiprocessor systems (see MS kb895980, kb896256)
it can be unreliable if CPU frequency varies (read this article)
You can evaluate the precision on your system with simple test:
static void Main(string[] args)
{
int xcnt = 0;
long xdelta, xstart;
xstart = DateTime.UtcNow.Ticks;
do {
xdelta = DateTime.UtcNow.Ticks - xstart;
xcnt++;
} while (xdelta == 0);
Console.WriteLine("DateTime:\t{0} ms, in {1} cycles", xdelta / (10000.0), xcnt);
int ycnt = 0, ystart;
long ydelta;
ystart = Environment.TickCount;
do {
ydelta = Environment.TickCount - ystart;
ycnt++;
} while (ydelta == 0);
Console.WriteLine("Environment:\t{0} ms, in {1} cycles ", ydelta, ycnt);
Stopwatch sw = new Stopwatch();
int zcnt = 0;
long zstart, zdelta;
sw.Start();
zstart = sw.ElapsedTicks; // This minimizes the difference (opposed to just using 0)
do {
zdelta = sw.ElapsedTicks - zstart;
zcnt++;
} while (zdelta == 0);
sw.Stop();
Console.WriteLine("StopWatch:\t{0} ms, in {1} cycles", (zdelta * 1000.0) / Stopwatch.Frequency, zcnt);
Console.ReadKey();
}
Use Stopwatch class. There is a decent example on msdn: http://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.aspx
Stopwatch stopWatch = Stopwatch.StartNew();
Thread.Sleep(10000);
stopWatch.Stop();
// Get the elapsed time as a TimeSpan value.
TimeSpan ts = stopWatch.Elapsed;
Why are you worried about rollover? As long as the duration you are measuring is under 24.9 days and you calculate the relative duration, you're fine. It doesn't matter how long the system has been running, as long as you only concern yourself with your portion of that running time (as opposed to directly performing less-than or greater-than comparisons on the begin and end points). I.e. this:
int before_rollover = Int32.MaxValue - 5;
int after_rollover = Int32.MinValue + 7;
int duration = after_rollover - before_rollover;
Console.WriteLine("before_rollover: " + before_rollover.ToString());
Console.WriteLine("after_rollover: " + after_rollover.ToString());
Console.WriteLine("duration: " + duration.ToString());
correctly prints:
before_rollover: 2147483642
after_rollover: -2147483641
duration: 13
You don't have to worry about the sign bit. C#, like C, lets the CPU handle this.
This is a common situation I've run into before with time counts in embedded systems. I would never compare beforerollover < afterrollover directly, for instance. I would always perform the subtraction to find the duration that takes rollover into account, and then base any other calculations on the duration.
Environment.TickCount seems to be much faster then the other solutions:
Environment.TickCount 71
DateTime.UtcNow.Ticks 213
sw.ElapsedMilliseconds 1273
The measurements were generated by the following code:
static void Main( string[] args ) {
const int max = 10000000;
//
//
for ( int j = 0; j < 3; j++ ) {
var sw = new Stopwatch();
sw.Start();
for ( int i = 0; i < max; i++ ) {
var a = Environment.TickCount;
}
sw.Stop();
Console.WriteLine( $"Environment.TickCount {sw.ElapsedMilliseconds}" );
//
//
sw = new Stopwatch();
sw.Start();
for ( int i = 0; i < max; i++ ) {
var a = DateTime.UtcNow.Ticks;
}
sw.Stop();
Console.WriteLine( $"DateTime.UtcNow.Ticks {sw.ElapsedMilliseconds}" );
//
//
sw = new Stopwatch();
sw.Start();
for ( int i = 0; i < max; i++ ) {
var a = sw.ElapsedMilliseconds;
}
sw.Stop();
Console.WriteLine( $"sw.ElapsedMilliseconds {sw.ElapsedMilliseconds}" );
}
Console.WriteLine( "Done" );
Console.ReadKey();
}
Here is kind of an updated&refreshed summary of what may be the most useful answers & comments in this thread + extra benchmarks and variants:
First thing first: As others have pointed out in comments, things have changed the last years and with "modern" Windows (Win XP ++) and .NET, and modern hardware there are no or little reasons not to use Stopwatch().
See MSDN for details. Quotations:
"Is QPC accuracy affected by processor frequency changes caused by power management or Turbo Boost technology?
No. If the processor has an invariant TSC, the QPC is not affected by these sort of changes. If the processor doesn't have an invariant TSC, QPC will revert to a platform hardware timer that won't be affected by processor frequency changes or Turbo Boost technology.
Does QPC reliably work on multi-processor systems, multi-core system, and systems with hyper-threading?
Yes
How do I determine and validate that QPC works on my machine?
You don't need to perform such checks.
Which processors have non-invariant TSCs?
[..Read further..]
"
But if you don't need the precision of Stopwatch() or at least want to know exactly about the performance of Stopwatch (static vs. instance-based) and other possible variants, continue reading:
I took over the benchmark above from cskwg, and extended the code for more variants. I have measured with a some years old i7 4700 MQ and C# 7 with VS 2017 (to be more precise, compiled with .NET 4.5.2, despite binary literals, it is C# 6 (used of this: string literals and 'using static'). Especially the Stopwatch() performance seems to be improved compared to the mentioned benchmark.
This is an example of results of 10 million repetitions in a loop, as always, absolute values are not important, but even the relative values may differ on other hardware:
32 bit, Release mode without optimization:
Measured: GetTickCount64() [ms]: 275
Measured: Environment.TickCount [ms]: 45
Measured: DateTime.UtcNow.Ticks [ms]: 167
Measured: Stopwatch: .ElapsedTicks [ms]: 277
Measured: Stopwatch: .ElapsedMilliseconds [ms]: 548
Measured: static Stopwatch.GetTimestamp [ms]: 193
Measured: Stopwatch+conversion to DateTime [ms]: 551
Compare that with DateTime.Now.Ticks [ms]: 9010
32 bit, Release mode, optimized:
Measured: GetTickCount64() [ms]: 198
Measured: Environment.TickCount [ms]: 39
Measured: DateTime.UtcNow.Ticks [ms]: 66 (!)
Measured: Stopwatch: .ElapsedTicks [ms]: 175
Measured: Stopwatch: .ElapsedMilliseconds [ms]: 491
Measured: static Stopwatch.GetTimestamp [ms]: 175
Measured: Stopwatch+conversion to DateTime [ms]: 510
Compare that with DateTime.Now.Ticks [ms]: 8460
64 bit, Release mode without optimization:
Measured: GetTickCount64() [ms]: 205
Measured: Environment.TickCount [ms]: 39
Measured: DateTime.UtcNow.Ticks [ms]: 127
Measured: Stopwatch: .ElapsedTicks [ms]: 209
Measured: Stopwatch: .ElapsedMilliseconds [ms]: 285
Measured: static Stopwatch.GetTimestamp [ms]: 187
Measured: Stopwatch+conversion to DateTime [ms]: 319
Compare that with DateTime.Now.Ticks [ms]: 3040
64 bit, Release mode, optimized:
Measured: GetTickCount64() [ms]: 148
Measured: Environment.TickCount [ms]: 31 (is it still worth it?)
Measured: DateTime.UtcNow.Ticks [ms]: 76 (!)
Measured: Stopwatch: .ElapsedTicks [ms]: 178
Measured: Stopwatch: .ElapsedMilliseconds [ms]: 226
Measured: static Stopwatch.GetTimestamp [ms]: 175
Measured: Stopwatch+conversion to DateTime [ms]: 246
Compare that with DateTime.Now.Ticks [ms]: 3020
It may be very interesting, that creating a DateTime value to print out the Stopwatch time seems to have nearly no costs. Interesting in a more academic than practical way is that static Stopwatch is slightly faster (as expected). Some optimization points are quite interesting.
For example, I cannot explain why Stopwatch.ElapsedMilliseconds only with 32 bit is so slow compared to it's other variants, for example the static one. This and DateTime.Now more than double their speed with 64 bit.
You can see: Only for millions of executions, the time of Stopwatch begins to matter. If this is really the case (but beware micro-optimizing too early), it may be interesting that with GetTickCount64(), but especially with DateTime.UtcNow, you have a 64 bit (long) timer with less precision than Stopwatch, but faster, so that you don't have to mess around with the 32 bit "ugly" Environment.TickCount.
As expected, DateTime.Now is by far the slowest of all.
If you run it, the code retrieves also your current Stopwatch accuracy and more.
Here is the full benchmark code:
using System.Diagnostics;
using System.Runtime.InteropServices;
using System.Threading;
using static System.Environment;
[...]
[DllImport("kernel32.dll") ]
public static extern UInt64 GetTickCount64(); // Retrieves a 64bit value containing ticks since system start
static void Main(string[] args)
{
const int max = 10_000_000;
const int n = 3;
Stopwatch sw;
// Following Process&Thread lines according to tips by Thomas Maierhofer: https://codeproject.com/KB/testing/stopwatch-measure-precise.aspx
// But this somewhat contradicts to assertions by MS in: https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396#Does_QPC_reliably_work_on_multi-processor_systems__multi-core_system__and_________systems_with_hyper-threading
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(1); // Use only the first core
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
Thread.CurrentThread.Priority = ThreadPriority.Highest;
Thread.Sleep(2); // warmup
Console.WriteLine($"Repeating measurement {n} times in loop of {max:N0}:{NewLine}");
for (int j = 0; j < n; j++)
{
sw = new Stopwatch();
sw.Start();
for (int i = 0; i < max; i++)
{
var tickCount = GetTickCount64();
}
sw.Stop();
Console.WriteLine($"Measured: GetTickCount64() [ms]: {sw.ElapsedMilliseconds}");
//
//
sw = new Stopwatch();
sw.Start();
for (int i = 0; i < max; i++)
{
var tickCount = Environment.TickCount; // only int capacity, enough for a bit more than 24 days
}
sw.Stop();
Console.WriteLine($"Measured: Environment.TickCount [ms]: {sw.ElapsedMilliseconds}");
//
//
sw = new Stopwatch();
sw.Start();
for (int i = 0; i < max; i++)
{
var a = DateTime.UtcNow.Ticks;
}
sw.Stop();
Console.WriteLine($"Measured: DateTime.UtcNow.Ticks [ms]: {sw.ElapsedMilliseconds}");
//
//
sw = new Stopwatch();
sw.Start();
for (int i = 0; i < max; i++)
{
var a = sw.ElapsedMilliseconds;
}
sw.Stop();
Console.WriteLine($"Measured: Stopwatch: .ElapsedMilliseconds [ms]: {sw.ElapsedMilliseconds}");
//
//
sw = new Stopwatch();
sw.Start();
for (int i = 0; i < max; i++)
{
var a = Stopwatch.GetTimestamp();
}
sw.Stop();
Console.WriteLine($"Measured: static Stopwatch.GetTimestamp [ms]: {sw.ElapsedMilliseconds}");
//
//
DateTime dt=DateTime.MinValue; // just init
sw = new Stopwatch();
sw.Start();
for (int i = 0; i < max; i++)
{
var a = new DateTime(sw.Elapsed.Ticks); // using variable dt here seems to make nearly no difference
}
sw.Stop();
//Console.WriteLine($"Measured: Stopwatch+conversion to DateTime [s] with millisecs: {dt:s.fff}");
Console.WriteLine($"Measured: Stopwatch+conversion to DateTime [ms]: {sw.ElapsedMilliseconds}");
Console.WriteLine();
}
//
//
sw = new Stopwatch();
var tickCounterStart = Environment.TickCount;
sw.Start();
for (int i = 0; i < max/10; i++)
{
var a = DateTime.Now.Ticks;
}
sw.Stop();
var tickCounter = Environment.TickCount - tickCounterStart;
Console.WriteLine($"Compare that with DateTime.Now.Ticks [ms]: {sw.ElapsedMilliseconds*10}");
Console.WriteLine($"{NewLine}General Stopwatch information:");
if (Stopwatch.IsHighResolution)
Console.WriteLine("- Using high-resolution performance counter for Stopwatch class.");
else
Console.WriteLine("- Using high-resolution performance counter for Stopwatch class.");
double freq = (double)Stopwatch.Frequency;
double ticksPerMicroSec = freq / (1000d*1000d) ; // microsecond resolution: 1 million ticks per sec
Console.WriteLine($"- Stopwatch accuracy- ticks per microsecond (1000 ms): {ticksPerMicroSec:N1}");
Console.WriteLine(" (Max. tick resolution normally is 100 nanoseconds, this is 10 ticks/microsecond.)");
DateTime maxTimeForTickCountInteger= new DateTime(Int32.MaxValue*10_000L); // tickCount means millisec -> there are 10.000 milliseconds in 100 nanoseconds, which is the tick resolution in .NET, e.g. used for TimeSpan
Console.WriteLine($"- Approximated capacity (maxtime) of TickCount [dd:hh:mm:ss] {maxTimeForTickCountInteger:dd:HH:mm:ss}");
// this conversion from seems not really accurate, it will be between 24-25 days.
Console.WriteLine($"{NewLine}Done.");
while (Console.KeyAvailable)
Console.ReadKey(false);
Console.ReadKey();
}
You probably want System.Diagnostics.StopWatch.
If you're looking for the functionality of Environment.TickCount but without the overhead of creating new Stopwatch objects, you can use the static Stopwatch.GetTimestamp() method (along with Stopwatch.Frequency) to calculate long time spans. Because GetTimestamp() returns a long, it won't overflow for a very, very long time (over 100,000 years, on the machine I'm using to write this). It's also much more accurate than Environment.TickCount which has a maximum resolution of 10 to 16 milliseconds.
Use
System.Diagnostics.Stopwatch
It has a property called
EllapsedMilliseconds
TickCount64
Doing some quick measurements on this new function, I found (optimized, release 64-bit, 1000mio loops):
Environment.TickCount: 2265
Environment.TickCount64: 2531
DateTime.UtcNow.Ticks: 69016
The measurements for not-optimized code were similar.
Test code:
static void Main( string[] args ) {
long start, end, length = 1000 * 1000 * 1000;
start = Environment.TickCount64;
for ( int i = 0; i < length; i++ ) {
var a = Environment.TickCount;
}
end = Environment.TickCount64;
Console.WriteLine( "Environment.TickCount: {0}", end - start );
start = Environment.TickCount64;
for ( int i = 0; i < length; i++ ) {
var a = Environment.TickCount64;
}
end = Environment.TickCount64;
Console.WriteLine( "Environment.TickCount64: {0}", end - start );
start = Environment.TickCount64;
for ( int i = 0; i < length; i++ ) {
var a = DateTime.UtcNow.Ticks;
}
end = Environment.TickCount64;
Console.WriteLine( "DateTime.UtcNow.Ticks: {0}", end - start );
}
You should use the Stopwatch class instead.
I use Environment.TickCount because:
The Stopwatch class is not in the Compact Framework.
Stopwatch uses the same underlying timing mechanism as TickCount, so the results won't be any more or less accurate.
The wrap-around problem with TickCount is cosmically unlikely to be hit (you'd have to leave your computer running for 27 days and then try to measure a time that just happens to span the wrap-around moment), and even if you did hit it the result would be a huge negative time span (so it would kind of stand out).
That being said, I would also recommend using Stopwatch, if it's available to you. Or you could take about 1 minute and write a Stopwatch-like class that wraps Environment.TickCount.
BTW, I see nothing in the Stopwatch documentation that mentions the wrap-around problem with the underlying timer mechanism, so I wouldn't be surprised at all to find that Stopwatch suffers from the same problem. But again, I wouldn't spend any time worrying about it.
I was going to say wrap it into a stopwatch class, but Grzenio already said the right thing, so I will give him an uptick. Such encapsulation factors out the decision as to which way is better, and this can change in time. I remember being shocked at how expensive it can be getting the time on some systems, so having one place that can implement the best technique can be very important.
For one-shot timing, it's even simpler to write
Stopwatch stopWatch = Stopwatch.StartNew();
...dostuff...
Debug.WriteLine(String.Format("It took {0} milliseconds",
stopWatch.EllapsedMilliseconds)));
I'd guess the cosmically unlikely wraparound in TickCount is even less of a concern for StopWatch, given that the ElapsedTicks field is a long. On my machine, StopWatch is high resolution, at 2.4e9 ticks per second. Even at that rate, it would take over 121 years to overflow the ticks field. Of course, I don't know what's going on under the covers, so take that with a grain of salt. However, I notice that the documentation for StopWatch doesn't even mention the wraparound issue, while the doc for TickCount does.
Overflow compensation
As said before, rollover may happen after 24.9 days, or, if you use an uint cast, after 49.8 days.
Because I did not want to pInvoke GetTickCount64, I wrote this overflow compensation. The sample code is using 'byte' to keep the numbers handy. Please have a look at it, it still might contain errors:
using System;
namespace ConsoleApp1 {
class Program {
//
static byte Lower = byte.MaxValue / 3;
static byte Upper = 2 * byte.MaxValue / 3;
//
///<summary>Compute delta between two TickCount values reliably, because TickCount might wrap after 49.8 days.</summary>
static short Delta( byte next, byte ticks ) {
if ( next < Lower ) {
if ( ticks > Upper ) {
return (short) ( ticks - ( byte.MaxValue + 1 + next ) );
}
}
if ( next > Upper ) {
if ( ticks < Lower ) {
return (short) ( ( ticks + byte.MaxValue + 1 ) - next );
}
}
return (short) ( ticks - next );
}
//
static void Main( string[] args ) {
// Init
Random rnd = new Random();
int max = 0;
byte last = 0;
byte wait = 3;
byte next = (byte) ( last + wait );
byte step = 0;
// Loop tick
for ( byte tick = 0; true; ) {
//
short delta = Delta( next, tick );
if ( delta >= 0 ) {
Console.WriteLine( "RUN: last: {0} next: {1} tick: {2} delta: {3}", last, next, tick, delta );
last = tick;
next = (byte) ( last + wait );
}
// Will overflow to 0 automatically
step = (byte) rnd.Next( 0, 11 );
tick += step;
max++; if ( max > 99999 ) break;
}
}
}
}

Environment.TickCount vs DateTime.Now

Is it ever OK to use Environment.TickCountto calculate time spans?
int start = Environment.TickCount;
// Do stuff
int duration = Environment.TickCount - start;
Console.WriteLine("That took " + duration " ms");
Because TickCount is signed and will rollover after 25 days (it takes 50 days to hit all 32 bits, but you have to scrap the signed bit if you want to make any sense of the math), it seems like it's too risky to be useful.
I'm using DateTime.Now instead. Is this the best way to do this?
DateTime start = DateTime.Now;
// Do stuff
TimeSpan duration = DateTime.Now - start;
Console.WriteLine("That took " + duration.TotalMilliseconds + " ms");
Environment.TickCount is based on GetTickCount() WinAPI function. It's in milliseconds
But the actual precision of it is about 15.6 ms. So you can't measure shorter time intervals (or you'll get 0)
Note: The returned value is Int32, so this counter rolls over each ~49.7 days. You shouldn't use it to measure such long intervals.
DateTime.Ticks is based on GetSystemTimeAsFileTime() WinAPI function. It's in 100s nanoseconds (tenths of microsoconds).
The actual precision of DateTime.Ticks depends on the system. On XP, the increment of system clock is about 15.6 ms, the same as in Environment.TickCount.
On Windows 7 its precision is 1 ms (while Environemnt.TickCount's is still 15.6 ms), however if a power saving scheme is used (usually on laptops) it can go down to 15.6 ms as well.
Stopwatch is based on QueryPerformanceCounter() WinAPI function (but if high-resolution performance counter is not supported by your system, DateTime.Ticks is used)
Before using StopWatch notice two problems:
it can be unreliable on multiprocessor systems (see MS kb895980, kb896256)
it can be unreliable if CPU frequency varies (read this article)
You can evaluate the precision on your system with simple test:
static void Main(string[] args)
{
int xcnt = 0;
long xdelta, xstart;
xstart = DateTime.UtcNow.Ticks;
do {
xdelta = DateTime.UtcNow.Ticks - xstart;
xcnt++;
} while (xdelta == 0);
Console.WriteLine("DateTime:\t{0} ms, in {1} cycles", xdelta / (10000.0), xcnt);
int ycnt = 0, ystart;
long ydelta;
ystart = Environment.TickCount;
do {
ydelta = Environment.TickCount - ystart;
ycnt++;
} while (ydelta == 0);
Console.WriteLine("Environment:\t{0} ms, in {1} cycles ", ydelta, ycnt);
Stopwatch sw = new Stopwatch();
int zcnt = 0;
long zstart, zdelta;
sw.Start();
zstart = sw.ElapsedTicks; // This minimizes the difference (opposed to just using 0)
do {
zdelta = sw.ElapsedTicks - zstart;
zcnt++;
} while (zdelta == 0);
sw.Stop();
Console.WriteLine("StopWatch:\t{0} ms, in {1} cycles", (zdelta * 1000.0) / Stopwatch.Frequency, zcnt);
Console.ReadKey();
}
Use Stopwatch class. There is a decent example on msdn: http://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.aspx
Stopwatch stopWatch = Stopwatch.StartNew();
Thread.Sleep(10000);
stopWatch.Stop();
// Get the elapsed time as a TimeSpan value.
TimeSpan ts = stopWatch.Elapsed;
Why are you worried about rollover? As long as the duration you are measuring is under 24.9 days and you calculate the relative duration, you're fine. It doesn't matter how long the system has been running, as long as you only concern yourself with your portion of that running time (as opposed to directly performing less-than or greater-than comparisons on the begin and end points). I.e. this:
int before_rollover = Int32.MaxValue - 5;
int after_rollover = Int32.MinValue + 7;
int duration = after_rollover - before_rollover;
Console.WriteLine("before_rollover: " + before_rollover.ToString());
Console.WriteLine("after_rollover: " + after_rollover.ToString());
Console.WriteLine("duration: " + duration.ToString());
correctly prints:
before_rollover: 2147483642
after_rollover: -2147483641
duration: 13
You don't have to worry about the sign bit. C#, like C, lets the CPU handle this.
This is a common situation I've run into before with time counts in embedded systems. I would never compare beforerollover < afterrollover directly, for instance. I would always perform the subtraction to find the duration that takes rollover into account, and then base any other calculations on the duration.
Environment.TickCount seems to be much faster then the other solutions:
Environment.TickCount 71
DateTime.UtcNow.Ticks 213
sw.ElapsedMilliseconds 1273
The measurements were generated by the following code:
static void Main( string[] args ) {
const int max = 10000000;
//
//
for ( int j = 0; j < 3; j++ ) {
var sw = new Stopwatch();
sw.Start();
for ( int i = 0; i < max; i++ ) {
var a = Environment.TickCount;
}
sw.Stop();
Console.WriteLine( $"Environment.TickCount {sw.ElapsedMilliseconds}" );
//
//
sw = new Stopwatch();
sw.Start();
for ( int i = 0; i < max; i++ ) {
var a = DateTime.UtcNow.Ticks;
}
sw.Stop();
Console.WriteLine( $"DateTime.UtcNow.Ticks {sw.ElapsedMilliseconds}" );
//
//
sw = new Stopwatch();
sw.Start();
for ( int i = 0; i < max; i++ ) {
var a = sw.ElapsedMilliseconds;
}
sw.Stop();
Console.WriteLine( $"sw.ElapsedMilliseconds {sw.ElapsedMilliseconds}" );
}
Console.WriteLine( "Done" );
Console.ReadKey();
}
Here is kind of an updated&refreshed summary of what may be the most useful answers & comments in this thread + extra benchmarks and variants:
First thing first: As others have pointed out in comments, things have changed the last years and with "modern" Windows (Win XP ++) and .NET, and modern hardware there are no or little reasons not to use Stopwatch().
See MSDN for details. Quotations:
"Is QPC accuracy affected by processor frequency changes caused by power management or Turbo Boost technology?
No. If the processor has an invariant TSC, the QPC is not affected by these sort of changes. If the processor doesn't have an invariant TSC, QPC will revert to a platform hardware timer that won't be affected by processor frequency changes or Turbo Boost technology.
Does QPC reliably work on multi-processor systems, multi-core system, and systems with hyper-threading?
Yes
How do I determine and validate that QPC works on my machine?
You don't need to perform such checks.
Which processors have non-invariant TSCs?
[..Read further..]
"
But if you don't need the precision of Stopwatch() or at least want to know exactly about the performance of Stopwatch (static vs. instance-based) and other possible variants, continue reading:
I took over the benchmark above from cskwg, and extended the code for more variants. I have measured with a some years old i7 4700 MQ and C# 7 with VS 2017 (to be more precise, compiled with .NET 4.5.2, despite binary literals, it is C# 6 (used of this: string literals and 'using static'). Especially the Stopwatch() performance seems to be improved compared to the mentioned benchmark.
This is an example of results of 10 million repetitions in a loop, as always, absolute values are not important, but even the relative values may differ on other hardware:
32 bit, Release mode without optimization:
Measured: GetTickCount64() [ms]: 275
Measured: Environment.TickCount [ms]: 45
Measured: DateTime.UtcNow.Ticks [ms]: 167
Measured: Stopwatch: .ElapsedTicks [ms]: 277
Measured: Stopwatch: .ElapsedMilliseconds [ms]: 548
Measured: static Stopwatch.GetTimestamp [ms]: 193
Measured: Stopwatch+conversion to DateTime [ms]: 551
Compare that with DateTime.Now.Ticks [ms]: 9010
32 bit, Release mode, optimized:
Measured: GetTickCount64() [ms]: 198
Measured: Environment.TickCount [ms]: 39
Measured: DateTime.UtcNow.Ticks [ms]: 66 (!)
Measured: Stopwatch: .ElapsedTicks [ms]: 175
Measured: Stopwatch: .ElapsedMilliseconds [ms]: 491
Measured: static Stopwatch.GetTimestamp [ms]: 175
Measured: Stopwatch+conversion to DateTime [ms]: 510
Compare that with DateTime.Now.Ticks [ms]: 8460
64 bit, Release mode without optimization:
Measured: GetTickCount64() [ms]: 205
Measured: Environment.TickCount [ms]: 39
Measured: DateTime.UtcNow.Ticks [ms]: 127
Measured: Stopwatch: .ElapsedTicks [ms]: 209
Measured: Stopwatch: .ElapsedMilliseconds [ms]: 285
Measured: static Stopwatch.GetTimestamp [ms]: 187
Measured: Stopwatch+conversion to DateTime [ms]: 319
Compare that with DateTime.Now.Ticks [ms]: 3040
64 bit, Release mode, optimized:
Measured: GetTickCount64() [ms]: 148
Measured: Environment.TickCount [ms]: 31 (is it still worth it?)
Measured: DateTime.UtcNow.Ticks [ms]: 76 (!)
Measured: Stopwatch: .ElapsedTicks [ms]: 178
Measured: Stopwatch: .ElapsedMilliseconds [ms]: 226
Measured: static Stopwatch.GetTimestamp [ms]: 175
Measured: Stopwatch+conversion to DateTime [ms]: 246
Compare that with DateTime.Now.Ticks [ms]: 3020
It may be very interesting, that creating a DateTime value to print out the Stopwatch time seems to have nearly no costs. Interesting in a more academic than practical way is that static Stopwatch is slightly faster (as expected). Some optimization points are quite interesting.
For example, I cannot explain why Stopwatch.ElapsedMilliseconds only with 32 bit is so slow compared to it's other variants, for example the static one. This and DateTime.Now more than double their speed with 64 bit.
You can see: Only for millions of executions, the time of Stopwatch begins to matter. If this is really the case (but beware micro-optimizing too early), it may be interesting that with GetTickCount64(), but especially with DateTime.UtcNow, you have a 64 bit (long) timer with less precision than Stopwatch, but faster, so that you don't have to mess around with the 32 bit "ugly" Environment.TickCount.
As expected, DateTime.Now is by far the slowest of all.
If you run it, the code retrieves also your current Stopwatch accuracy and more.
Here is the full benchmark code:
using System.Diagnostics;
using System.Runtime.InteropServices;
using System.Threading;
using static System.Environment;
[...]
[DllImport("kernel32.dll") ]
public static extern UInt64 GetTickCount64(); // Retrieves a 64bit value containing ticks since system start
static void Main(string[] args)
{
const int max = 10_000_000;
const int n = 3;
Stopwatch sw;
// Following Process&Thread lines according to tips by Thomas Maierhofer: https://codeproject.com/KB/testing/stopwatch-measure-precise.aspx
// But this somewhat contradicts to assertions by MS in: https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396#Does_QPC_reliably_work_on_multi-processor_systems__multi-core_system__and_________systems_with_hyper-threading
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(1); // Use only the first core
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
Thread.CurrentThread.Priority = ThreadPriority.Highest;
Thread.Sleep(2); // warmup
Console.WriteLine($"Repeating measurement {n} times in loop of {max:N0}:{NewLine}");
for (int j = 0; j < n; j++)
{
sw = new Stopwatch();
sw.Start();
for (int i = 0; i < max; i++)
{
var tickCount = GetTickCount64();
}
sw.Stop();
Console.WriteLine($"Measured: GetTickCount64() [ms]: {sw.ElapsedMilliseconds}");
//
//
sw = new Stopwatch();
sw.Start();
for (int i = 0; i < max; i++)
{
var tickCount = Environment.TickCount; // only int capacity, enough for a bit more than 24 days
}
sw.Stop();
Console.WriteLine($"Measured: Environment.TickCount [ms]: {sw.ElapsedMilliseconds}");
//
//
sw = new Stopwatch();
sw.Start();
for (int i = 0; i < max; i++)
{
var a = DateTime.UtcNow.Ticks;
}
sw.Stop();
Console.WriteLine($"Measured: DateTime.UtcNow.Ticks [ms]: {sw.ElapsedMilliseconds}");
//
//
sw = new Stopwatch();
sw.Start();
for (int i = 0; i < max; i++)
{
var a = sw.ElapsedMilliseconds;
}
sw.Stop();
Console.WriteLine($"Measured: Stopwatch: .ElapsedMilliseconds [ms]: {sw.ElapsedMilliseconds}");
//
//
sw = new Stopwatch();
sw.Start();
for (int i = 0; i < max; i++)
{
var a = Stopwatch.GetTimestamp();
}
sw.Stop();
Console.WriteLine($"Measured: static Stopwatch.GetTimestamp [ms]: {sw.ElapsedMilliseconds}");
//
//
DateTime dt=DateTime.MinValue; // just init
sw = new Stopwatch();
sw.Start();
for (int i = 0; i < max; i++)
{
var a = new DateTime(sw.Elapsed.Ticks); // using variable dt here seems to make nearly no difference
}
sw.Stop();
//Console.WriteLine($"Measured: Stopwatch+conversion to DateTime [s] with millisecs: {dt:s.fff}");
Console.WriteLine($"Measured: Stopwatch+conversion to DateTime [ms]: {sw.ElapsedMilliseconds}");
Console.WriteLine();
}
//
//
sw = new Stopwatch();
var tickCounterStart = Environment.TickCount;
sw.Start();
for (int i = 0; i < max/10; i++)
{
var a = DateTime.Now.Ticks;
}
sw.Stop();
var tickCounter = Environment.TickCount - tickCounterStart;
Console.WriteLine($"Compare that with DateTime.Now.Ticks [ms]: {sw.ElapsedMilliseconds*10}");
Console.WriteLine($"{NewLine}General Stopwatch information:");
if (Stopwatch.IsHighResolution)
Console.WriteLine("- Using high-resolution performance counter for Stopwatch class.");
else
Console.WriteLine("- Using high-resolution performance counter for Stopwatch class.");
double freq = (double)Stopwatch.Frequency;
double ticksPerMicroSec = freq / (1000d*1000d) ; // microsecond resolution: 1 million ticks per sec
Console.WriteLine($"- Stopwatch accuracy- ticks per microsecond (1000 ms): {ticksPerMicroSec:N1}");
Console.WriteLine(" (Max. tick resolution normally is 100 nanoseconds, this is 10 ticks/microsecond.)");
DateTime maxTimeForTickCountInteger= new DateTime(Int32.MaxValue*10_000L); // tickCount means millisec -> there are 10.000 milliseconds in 100 nanoseconds, which is the tick resolution in .NET, e.g. used for TimeSpan
Console.WriteLine($"- Approximated capacity (maxtime) of TickCount [dd:hh:mm:ss] {maxTimeForTickCountInteger:dd:HH:mm:ss}");
// this conversion from seems not really accurate, it will be between 24-25 days.
Console.WriteLine($"{NewLine}Done.");
while (Console.KeyAvailable)
Console.ReadKey(false);
Console.ReadKey();
}
You probably want System.Diagnostics.StopWatch.
If you're looking for the functionality of Environment.TickCount but without the overhead of creating new Stopwatch objects, you can use the static Stopwatch.GetTimestamp() method (along with Stopwatch.Frequency) to calculate long time spans. Because GetTimestamp() returns a long, it won't overflow for a very, very long time (over 100,000 years, on the machine I'm using to write this). It's also much more accurate than Environment.TickCount which has a maximum resolution of 10 to 16 milliseconds.
Use
System.Diagnostics.Stopwatch
It has a property called
EllapsedMilliseconds
TickCount64
Doing some quick measurements on this new function, I found (optimized, release 64-bit, 1000mio loops):
Environment.TickCount: 2265
Environment.TickCount64: 2531
DateTime.UtcNow.Ticks: 69016
The measurements for not-optimized code were similar.
Test code:
static void Main( string[] args ) {
long start, end, length = 1000 * 1000 * 1000;
start = Environment.TickCount64;
for ( int i = 0; i < length; i++ ) {
var a = Environment.TickCount;
}
end = Environment.TickCount64;
Console.WriteLine( "Environment.TickCount: {0}", end - start );
start = Environment.TickCount64;
for ( int i = 0; i < length; i++ ) {
var a = Environment.TickCount64;
}
end = Environment.TickCount64;
Console.WriteLine( "Environment.TickCount64: {0}", end - start );
start = Environment.TickCount64;
for ( int i = 0; i < length; i++ ) {
var a = DateTime.UtcNow.Ticks;
}
end = Environment.TickCount64;
Console.WriteLine( "DateTime.UtcNow.Ticks: {0}", end - start );
}
You should use the Stopwatch class instead.
I use Environment.TickCount because:
The Stopwatch class is not in the Compact Framework.
Stopwatch uses the same underlying timing mechanism as TickCount, so the results won't be any more or less accurate.
The wrap-around problem with TickCount is cosmically unlikely to be hit (you'd have to leave your computer running for 27 days and then try to measure a time that just happens to span the wrap-around moment), and even if you did hit it the result would be a huge negative time span (so it would kind of stand out).
That being said, I would also recommend using Stopwatch, if it's available to you. Or you could take about 1 minute and write a Stopwatch-like class that wraps Environment.TickCount.
BTW, I see nothing in the Stopwatch documentation that mentions the wrap-around problem with the underlying timer mechanism, so I wouldn't be surprised at all to find that Stopwatch suffers from the same problem. But again, I wouldn't spend any time worrying about it.
I was going to say wrap it into a stopwatch class, but Grzenio already said the right thing, so I will give him an uptick. Such encapsulation factors out the decision as to which way is better, and this can change in time. I remember being shocked at how expensive it can be getting the time on some systems, so having one place that can implement the best technique can be very important.
For one-shot timing, it's even simpler to write
Stopwatch stopWatch = Stopwatch.StartNew();
...dostuff...
Debug.WriteLine(String.Format("It took {0} milliseconds",
stopWatch.EllapsedMilliseconds)));
I'd guess the cosmically unlikely wraparound in TickCount is even less of a concern for StopWatch, given that the ElapsedTicks field is a long. On my machine, StopWatch is high resolution, at 2.4e9 ticks per second. Even at that rate, it would take over 121 years to overflow the ticks field. Of course, I don't know what's going on under the covers, so take that with a grain of salt. However, I notice that the documentation for StopWatch doesn't even mention the wraparound issue, while the doc for TickCount does.
Overflow compensation
As said before, rollover may happen after 24.9 days, or, if you use an uint cast, after 49.8 days.
Because I did not want to pInvoke GetTickCount64, I wrote this overflow compensation. The sample code is using 'byte' to keep the numbers handy. Please have a look at it, it still might contain errors:
using System;
namespace ConsoleApp1 {
class Program {
//
static byte Lower = byte.MaxValue / 3;
static byte Upper = 2 * byte.MaxValue / 3;
//
///<summary>Compute delta between two TickCount values reliably, because TickCount might wrap after 49.8 days.</summary>
static short Delta( byte next, byte ticks ) {
if ( next < Lower ) {
if ( ticks > Upper ) {
return (short) ( ticks - ( byte.MaxValue + 1 + next ) );
}
}
if ( next > Upper ) {
if ( ticks < Lower ) {
return (short) ( ( ticks + byte.MaxValue + 1 ) - next );
}
}
return (short) ( ticks - next );
}
//
static void Main( string[] args ) {
// Init
Random rnd = new Random();
int max = 0;
byte last = 0;
byte wait = 3;
byte next = (byte) ( last + wait );
byte step = 0;
// Loop tick
for ( byte tick = 0; true; ) {
//
short delta = Delta( next, tick );
if ( delta >= 0 ) {
Console.WriteLine( "RUN: last: {0} next: {1} tick: {2} delta: {3}", last, next, tick, delta );
last = tick;
next = (byte) ( last + wait );
}
// Will overflow to 0 automatically
step = (byte) rnd.Next( 0, 11 );
tick += step;
max++; if ( max > 99999 ) break;
}
}
}
}

Categories