Simple time profiling - strange times - c#

I'm trying to profile my code to check how long it takes to execute some parts of my code.
I've wrapped my most time-consuming part of the code in something like that:
DateTime start = DateTime.Now;
...
... // Here comes the time-consuming part
...
Console.WriteLine((DateTime.Now - start).Miliseconds);
The program is executing this part of code for couple of seconds (about 20 s) but in console I get the result of something about 800 miliseconds. Why is that so? What am I doing wrong?

Try using the Stopwatch class for this. It was intended for this exact purpose.
Stopwatch sw = Stopwatch.StartNew();
// ...
// Here comes the time-consuming part
// ...
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);

Are you actually wanting the TotalMilliseconds property? Milliseconds returns the milliseconds component of the timespan, not the actual length of the timespan in milliseconds.
That said, you probably want to use Stopwatch (as the others said) since it will be more accurate.

This is a much better way to profile your code.
var result = CallMethod(); // This will JIT the method
var sw = Stopwatch.StartNew();
for (int i = 0; i < 5; i++)
{
result = CallMethod();
}
sw.Stop();
Console.WriteLine(result);
Console.WriteLine(TimeSpan.FromTick(sw.ElapsedTicks / 5));

If you instead reference the TotalMilliseconds property, you will get the result you were looking for. But, I think the other answers recommending Stopwatch to be a better practice.

Related

C# boolean compare vs set performance

In C#, suppose I have a foreach loop and it is possible that the iterator will be empty. Following the loop, more actions need to be taken only if the iterator was not empty. So I declare bool res = false; before the loop. Is it faster to just set res = true; in each loop iteration, or to test if it's been done yet, as in if (!res) res = true;. I suppose the question could more succinctly be stated as "is it faster to set a bool's value or test its value?"
In addition, even if one is slightly faster than the other, is it feasible to have so many iterations in the loop that the impact on performance is not negligible?
To kill a few minutes:
static void Main(string[] args)
{
bool test = false;
Stopwatch sw = new Stopwatch();
sw.Start();
for (long i = 0; i < 100000000; i++)
{
if (!test)
test = true;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds + ". Hi, I'm just using test somehow:" + test);
sw.Reset();
bool test2 = false;
sw.Start();
for (long i = 0; i < 100000000; i++)
{
test2 = true;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds + ". Hi, I'm just using test2 somehow:" + test2);
Console.ReadKey();
}
Output:
448
379
So, unless missed somthing, just setting the value is faster than checking and then setting it. Is that what you wanted to test?
EDIT:
Fixed an error pointed out in the comments. As a side note, I indeed ran this test a few times and even when the miliseconds changed, the second case was always slighty faster.
if (!res) res = true is redundant.
The compiler should be smart enough to know that res will always end up being true and remove your if statement and/or completely remove the set altogether if you compile with Release / Optimize Code.
To your question itself. It should be faster to set a primitive value than to actually compare and set. I highly doubt you would be able to accurately measure the time difference at all on a primitive and just thinking about this alone consumed more time than the process will in x exagerrated iterations.

Is 'yield return' slower than "old school" return?

I'm doing some tests about yield return perfomance, and I found that it is slower than normal return.
I tested value variables (int, double, etc.) and some references types (string, etc.)... And yield return were slower in both cases. Why use it then?
Check out my example:
public class YieldReturnTeste
{
private static IEnumerable<string> YieldReturnTest(int limite)
{
for (int i = 0; i < limite; i++)
{
yield return i.ToString();
}
}
private static IEnumerable<string> NormalReturnTest(int limite)
{
List<string> listaInteiros = new List<string>();
for (int i = 0; i < limite; i++)
{
listaInteiros.Add(i.ToString());
}
return listaInteiros;
}
public static void executaTeste()
{
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
List<string> minhaListaYield = YieldReturnTest(2000000).ToList();
stopWatch.Stop();
TimeSpan ts = stopWatch.Elapsed;
string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}",
ts.Hours, ts.Minutes, ts.Seconds,
ts.Milliseconds / 10);
Console.WriteLine("Yield return: {0}", elapsedTime);
//****
stopWatch = new Stopwatch();
stopWatch.Start();
List<string> minhaListaNormal = NormalReturnTest(2000000).ToList();
stopWatch.Stop();
ts = stopWatch.Elapsed;
elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}",
ts.Hours, ts.Minutes, ts.Seconds,
ts.Milliseconds / 10);
Console.WriteLine("Normal return: {0}", elapsedTime);
}
}
Consider the difference between File.ReadAllLines and File.ReadLines.
ReadAllLines loads all of the lines into memory and returns a string[]. All well and good if the file is small. If the file is larger than will fit in memory, you'll run out of memory.
ReadLines, on the other hand, uses yield return to return one line at a time. With it, you can read any size file. It doesn't load the whole file into memory.
Say you wanted to find the first line that contains the word "foo", and then exit. Using ReadAllLines, you'd have to read the entire file into memory, even if "foo" occurs on the first line. With ReadLines, you only read one line. Which one would be faster?
That's not the only reason. Consider a program that reads a file and processes each line. Using File.ReadAllLines, you end up with:
string[] lines = File.ReadAllLines(filename);
for (int i = 0; i < lines.Length; ++i)
{
// process line
}
The time it takes that program to execute is equal to the time it takes to read the file, plus time to process the lines. Imagine that the processing takes so long that you want to speed it up with multiple threads. So you do something like:
lines = File.ReadAllLines(filename);
Parallel.Foreach(...);
But the reading is single-threaded. Your multiple threads can't start until the main thread has loaded the entire file.
With ReadLines, though, you can do something like:
Parallel.Foreach(File.ReadLines(filename), line => { ProcessLine(line); });
That starts up multiple threads immediately, which are processing at the same time that other lines are being read. So the reading time is overlapped with the processing time, meaning that your program will execute faster.
I show my examples using files because it's easier to demonstrate the concepts that way, but the same holds true for in-memory collections. Using yield return will use less memory and is potentially faster, especially when calling methods that only need to look at part of the collection (Enumerable.Any, Enumerable.First, etc.).
For one, it's a convenience feature. Two, it lets you do lazy return, which means that it's only evaluated when the value's fetched. That can be invaluable in stuff like a DB query, or just a collection you don't want to completely iterate over. Three, it can be faster in some scenarios. Four, what was the difference? Probably tiny, so micro optimization.
Because C# compiler converts iterator blocks (yield return) into state machine. State machine is very expensive in this case.
You can read more here: http://csharpindepth.com/articles/chapter6/iteratorblockimplementation.aspx
I used yield return to give me results from an algorithm. Every result is based on previous result, but I don't need all all of them. I used foreach with yield return to inspect each result and break the foreach loop if I get a result meet my requirement.
The algorithm was decent complex, so I think there was some decent work involved for saving states between each yield returns.
I noticed it was 3%-5% percentage slower than traditional return, but the improvement I get form not needing to generate all results is much much bigger than the loss of performance.
The .ToList() while necessary to really completing the otherwise deferred iteration of IEnumerable, hinders of measuring the core part.
At least it is important of initializing the list to the known size:
const int listSize=2000000;
var tempList = new List(listSize);
...
List tempList = YieldReturnTest(listSize).ToList();
Remark: Both calls took about the same time on my machine.. No difference (Mono 4 on repl.it).

Slow While loop in C#

I have a while loop and all it does is a method call. I have a timer on the outside of the loop and another timer that incrementally adds up the time the method call takes inside the loop. The outer time takes about 17 seconds and the total on the inner timer is 40 ms. The loop is executing 50,000 times. Here is an example of the code:
long InnerTime = 0;
long OutterTime = 0;
Stopw1.Start();
int count = 1;
while (count <= TestCollection.Count) {
Stopw2.Start();
Medthod1();
Stopw2.Stop();
InnerTime = InnerTime + Stopw2.ElapsedMilliseconds;
Stopw2.Reset();
count++;
}
Stopw1.Stop();
OutterTime = Stopw1.ElapsedMilliseconds;
Stopw1.Reset();
Any help would be much appreciated.
Massimo
You are comparing apples and oranges. Your outer timer measures the total time taken. Your inner timer measures the number of whole milliseconds taken by the call to Method1.
The ElapsedMilliseconds property "represents elapsed time rounded down to the nearest whole millisecond value." So, you are rounding down to the nearest millisecond about 50,000 times.
If your call to Method1 takes, on average, less than 1ms, then most of the time, the `ElapsedMilliseconds' property will return 0 and your inner count will be much, much less than the actual time. In fact, your method takes about 0.3ms on average, so you're lucky even to get it to go over 1ms 40 times.
Use the Elapsed.TotalMilliseconds or ElapsedTicks property instead of ElapsedMilliseconds. One millisecond is equivalent to 10,000 ticks.
What is this doing: TestCollection.Count ?
I suspect your 17 seconds are being spent counting your 50,000 items over and over again.
Try changing this:
while (count <= TestCollection.Count) {
...
}
to this:
int total = TestCollection.Count;
while (count <= total) {
...
}
To add to what the others have already said, in general the C# compiler must re-evaluate any property, including
TestCollection.Count
for every single loop iteration. The property's value could change from iteration to iteration.
Assigning the value to a local variable removes the compiler's need to re-evaluate for every loop iteration.
The one exception that I'm aware of is for Array.Length, which benefits from an optimization specifically for arrays. This is referred to as Array Bounds Check Elimination.
To have a correct measurement of the time that your calls take,
you should use the Ticks
Please try the following:
long InnerTime = 0;
long OutterTime = 0;
Stopwatch Stopw1 = new Stopwatch();
Stopwatch Stopw2 = new Stopwatch();
Stopw1.Start();
int count = 1;
int run = TestCollection.Count;
while (count <= run) {
Stopw2.Start();
Medthod1();
Stopw2.Stop();
InnerTime = InnerTime + Stopw2.ElapsedTicks;
Stopw2.Reset();
count++;
}
Stopw1.Stop();
OutterTime = Stopw1.ElapsedTicks;
Stopw1.Reset();
You should not measure such a tiny method individually. But if you really want to, try this:
long innertime = 0;
while (count <= TestCollection.Count)
{
innertime -= Stopw2.GetTimestamp();
Medthod1();
innertime += Stopw2.GetTimestamp();
count++;
}
Console.WriteLine("{0} ms", innertime * 1000.0 / Stopw2.Frequency);

Timing C# code using Timer

Even though it is good to check performance of code in terms of algorithmic analysis and Big-Oh! notation i wanted to see how much it takes for the code to execute in my PC. I had initialized a List to 9999count and removed even elements out from the them. Sadly the timespan to execute this seems to be 0:0:0. Surprised by the result there must be something wrong in the way i time the execution. Could someone help me time the code correct?
IList<int> source = new List<int>(100);
for (int i = 0; i < 9999; i++)
{
source.Add(i);
}
TimeSpan startTime, duration;
startTime = Process.GetCurrentProcess().Threads[0].UserProcessorTime;
RemoveEven(ref source);
duration = Process.GetCurrentProcess().Threads[0].UserProcessorTime.Subtract(startTime);
Console.WriteLine(duration.Milliseconds);
Console.Read();
The most appropriate thing to use there would be Stopwatch - anything involving TimeSpan has nowhere near enough precision for this:
var watch = Stopwatch.StartNew();
// something to time
watch.Stop();
Console.WriteLine(watch.ElapsedMilliseconds);
However, a modern CPU is very fast, and it would not surprise me if it can remove them in that time. Normally, for timing, you need to repeat an operation a large number of times to get a reasonable measurement.
Aside: the ref in RemoveEven(ref source) is almost certainly not needed.
In .Net 2.0 you can use the Stopwatch class
IList<int> source = new List<int>(100);
for (int i = 0; i < 9999; i++)
{
source.Add(i);
}
Stopwatch watch = new Stopwatch();
watch.Start();
RemoveEven(ref source);
//watch.ElapsedMilliseconds contains the execution time in ms
watch.Stop()
Adding to previous answers:
var sw = Stopwatch.StartNew();
// instructions to time
sw.Stop();
sw.ElapsedMilliseconds returns a long and has a resolution of:
1 millisecond = 1000000 nanoseconds
sw.Elapsed.TotalMilliseconds returns a double and has a resolution equal to the inverse of Stopwatch.Frequency. On my PC for example Stopwatch.Frequency has a value of 2939541 ticks per second, that gives sw.Elapsed.TotalMilliseconds a resolution of:
1/2939541 seconds = 3,401891655874165e-7 seconds = 340 nanoseconds

C# performance analysis- how to count CPU cycles?

Is this a valid way to do performance analysis? I want to get nanosecond accuracy and determine the performance of typecasting:
class PerformanceTest
{
static double last = 0.0;
static List<object> numericGenericData = new List<object>();
static List<double> numericTypedData = new List<double>();
static void Main(string[] args)
{
double totalWithCasting = 0.0;
double totalWithoutCasting = 0.0;
for (double d = 0.0; d < 1000000.0; ++d)
{
numericGenericData.Add(d);
numericTypedData.Add(d);
}
Stopwatch stopwatch = new Stopwatch();
for (int i = 0; i < 10; ++i)
{
stopwatch.Start();
testWithTypecasting();
stopwatch.Stop();
totalWithCasting += stopwatch.ElapsedTicks;
stopwatch.Start();
testWithoutTypeCasting();
stopwatch.Stop();
totalWithoutCasting += stopwatch.ElapsedTicks;
}
Console.WriteLine("Avg with typecasting = {0}", (totalWithCasting/10));
Console.WriteLine("Avg without typecasting = {0}", (totalWithoutCasting/10));
Console.ReadKey();
}
static void testWithTypecasting()
{
foreach (object o in numericGenericData)
{
last = ((double)o*(double)o)/200;
}
}
static void testWithoutTypeCasting()
{
foreach (double d in numericTypedData)
{
last = (d * d)/200;
}
}
}
The output is:
Avg with typecasting = 468872.3
Avg without typecasting = 501157.9
I'm a little suspicious... it looks like there is nearly no impact on the performance. Is casting really that cheap?
Update:
class PerformanceTest
{
static double last = 0.0;
static object[] numericGenericData = new object[100000];
static double[] numericTypedData = new double[100000];
static Stopwatch stopwatch = new Stopwatch();
static double totalWithCasting = 0.0;
static double totalWithoutCasting = 0.0;
static void Main(string[] args)
{
for (int i = 0; i < 100000; ++i)
{
numericGenericData[i] = (double)i;
numericTypedData[i] = (double)i;
}
for (int i = 0; i < 10; ++i)
{
stopwatch.Start();
testWithTypecasting();
stopwatch.Stop();
totalWithCasting += stopwatch.ElapsedTicks;
stopwatch.Reset();
stopwatch.Start();
testWithoutTypeCasting();
stopwatch.Stop();
totalWithoutCasting += stopwatch.ElapsedTicks;
stopwatch.Reset();
}
Console.WriteLine("Avg with typecasting = {0}", (totalWithCasting/(10.0)));
Console.WriteLine("Avg without typecasting = {0}", (totalWithoutCasting / (10.0)));
Console.ReadKey();
}
static void testWithTypecasting()
{
foreach (object o in numericGenericData)
{
last = ((double)o * (double)o) / 200;
}
}
static void testWithoutTypeCasting()
{
foreach (double d in numericTypedData)
{
last = (d * d) / 200;
}
}
}
The output is:
Avg with typecasting = 4791
Avg without typecasting = 3303.9
Note that it's not typecasting that you are measuring, it's unboxing. The values are doubles all along, there is no type casting going on.
You forgot to reset the stopwatch between tests, so you are adding the accumulated time of all previous tests over and over. If you convert the ticks to actual time, you see that it adds up to much more than the time it took to run the test.
If you add a stopwatch.Reset(); before each stopwatch.Start();, you get a much more reasonable result like:
Avg with typecasting = 41027,1
Avg without typecasting = 20594,3
Unboxing a value is not so expensive, it only has to check that the data type in the object is correct, then get the value. Still it's a lot more work than when the type is already known. Remember that you are also measuring the looping, calculation and assigning of the result, which is the same for both tests.
Boxing a value is more expensive than unboxing it, as that allocates an object on the heap.
1) Yes, casting is usually (very) cheap.
2) You are not going to get nanosecond accuracy in a managed language. Or in an unmanaged language under most operating systems.
Consider
other processes
garbage collection
different JITters
different CPUs
And, your measurement includes the foreach loop, looks like 50% or more to me. Maybe 90%.
When you call Stopwatch.Start it is letting the timer continue to run from wherever it left off. You need to call Stopwatch.Reset() to set the timers back to zero before starting again. Personally I just use stopwatch = Stopwatch.StartNew() whenever I want to start a timer to avoid this sort of confusion.
Furthermore, you probably want to call both of your test methods before starting the "timing loop" so that they get a fair chance to "warm up" that piece of code and ensure that the JIT has had a chance to run to even the playing field.
When I do that on my machine, I see that testWithTypecasting runs in approximately half the time as testWithoutTypeCasting.
That being said however, the cast itself it not likely to be the most significant part of that performance penalty. The testWithTypecasting method is operating on a list of boxed doubles which means that there is an additional level of indirection required to retrieve each value (follow a reference to the value somewhere else in memory) in addition to increasing the total amount of memory consumed. This increases the amount of time spent on memory access and is likely to be a bigger effect than the CPU time spent "in the cast" itself.
Look into performance counters in the System.Diagnostics namespace, When you create a new counter, you first create a category, and then specify one or more counters to be placed in it.
// Create a collection of type CounterCreationDataCollection.
System.Diagnostics.CounterCreationDataCollection CounterDatas =
new System.Diagnostics.CounterCreationDataCollection();
// Create the counters and set their properties.
System.Diagnostics.CounterCreationData cdCounter1 =
new System.Diagnostics.CounterCreationData();
System.Diagnostics.CounterCreationData cdCounter2 =
new System.Diagnostics.CounterCreationData();
cdCounter1.CounterName = "Counter1";
cdCounter1.CounterHelp = "help string1";
cdCounter1.CounterType = System.Diagnostics.PerformanceCounterType.NumberOfItems64;
cdCounter2.CounterName = "Counter2";
cdCounter2.CounterHelp = "help string 2";
cdCounter2.CounterType = System.Diagnostics.PerformanceCounterType.NumberOfItems64;
// Add both counters to the collection.
CounterDatas.Add(cdCounter1);
CounterDatas.Add(cdCounter2);
// Create the category and pass the collection to it.
System.Diagnostics.PerformanceCounterCategory.Create(
"Multi Counter Category", "Category help", CounterDatas);
see MSDN docs
Just a thought but sometimes identical machine code can take a different number of cycles to execute depending on its alignment in memory so you might want to add a control or controls.
Don't "do" C# myself but in C for x86-32 and later the rdtsc instruction is usually available which is much more accurate than OS ticks. More info on rdtsc can be found by searching stackoverflow. Under C it is usually available as an intrinsic or built-in function and returns the number of clock cycles (in an 8 byte - long long/__int64 - unsigned integer) since the computer was powered up. So if the CPU has a clock speed of 3 Ghz the underlying counter is incremented 3 billion times per second. Save for a few early AMD processors, all multi-core CPUs will have their counters synchronized.
If C# does not have it you might consider writing a VERY short C function to access it from C#. There is a great deal of overhead if you access the instruction through a function vs inline. The difference between two back-to-back calls to the function will be the basic measurement overhead. If you're thinking of metering your application you'll have to determine several more complex overhead values.
You might consider shutting off the CPU energy-saving mode (and restarting the PC) as it lowers the clock frequency being fed to the CPU during periods of low activity. This is since it causes the time stamp counters of the different cores to become un-synchronized.

Categories