I'm a total newbie, but I was writing a little program that worked on strings in C# and I noticed that if I did a few things differently, the code executed significantly faster.
So it had me wondering, how do you go about clocking your code's execution speed? Are there any (free)utilities? Do you go about it the old-fashioned way with a System.Timer and do it yourself?
What you are describing is known as performance profiling. There are many programs you can get to do this such as Jetbrains profiler or Ants profiler, although most will slow down your application whilst in the process of measuring its performance.
To hand-roll your own performance profiling, you can use System.Diagnostics.Stopwatch and a simple Console.WriteLine, like you described.
Also keep in mind that the C# JIT compiler optimizes code depending on the type and frequency it is called, so play around with loops of differing sizes and methods such as recursive calls to get a feel of what works best.
ANTS Profiler from RedGate is a really nice performance profiler. dotTrace Profiler from JetBrains is also great. These tools will allow you to see performance metrics that can be drilled down the each individual line.
Scree shot of ANTS Profiler:
ANTS http://www.red-gate.com/products/ants_profiler/images/app/timeline_calltree3.gif
If you want to ensure that a specific method stays within a specific performance threshold during unit testing, I would use the Stopwatch class to monitor the execution time of a method one ore many times in a loop and calculate the average and then Assert against the result.
Just a reminder - make sure to compile in Relase, not Debug! (I've seen this mistake made by seasoned developers - it's easy to forget).
What are you describing is 'Performance Tuning'. When we talk about performance tuning there are two angle to it. (a) Response time - how long it take to execute a particular request/program. (b) Throughput - How many requests it can execute in a second. When we typically 'optimize' - when we eliminate unnecessary processing both response time as well as throughput improves. However if you have wait events in you code (like Thread.sleep(), I/O wait etc) your response time is affected however throughput is not affected. By adopting parallel processing (spawning multiple threads) we can improve response time but throughput will not be improved. Typically for server side application both response time and throughput are important. For desktop applications (like IDE) throughput is not important only response time is important.
You can measure response time by 'Performance Testing' - you just note down the response time for all key transactions. You can measure the throughput by 'Load Testing' - You need to pump requests continuously from sufficiently large number of threads/clients such that the CPU usage of server machine is 80-90%. When we pump request we need to maintain the ratio between different transactions (called transaction mix) - for eg: in a reservation system there will be 10 booking for every 100 search. there will be one cancellation for every 10 booking etc.
After identifying the transactions require tuning for response time (performance testing) you can identify the hot spots by using a profiler.
You can identify the hot spots for throughput by comparing the response time * fraction of that transaction. Assume in search, booking, cancellation scenario, ratio is 89:10:1.
Response time are 0.1 sec, 10 sec and 15 sec.
load for search - 0.1 * .89 = 0.089
load for booking- 10 * .1 = 1
load for cancell= 15 * .01= 0.15
Here tuning booking will yield maximum impact on throughput.
You can also identify hot spots for throughput by taking thread dumps (in the case of java based applications) repeatedly.
Use a profiler.
Ants (http://www.red-gate.com/Products/ants_profiler/index.htm)
dotTrace (http://www.jetbrains.com/profiler/)
If you need to time one specific method only, the Stopwatch class might be a good choice.
I do the following things:
1) I use ticks (e.g. in VB.Net Now.ticks) for measuring the current time. I subtract the starting ticks from the finished ticks value and divide by TimeSpan.TicksPerSecond to get how many seconds it took.
2) I avoid UI operations (like console.writeline).
3) I run the code over a substantial loop (like 100,000 iterations) to factor out usage / OS variables as best as I can.
You can use the StopWatch class to time methods. Remember the first time is often slow due to code having to be jitted.
There is a native .NET option (Team Edition for Software Developers) that might address some performance analysis needs. From the 2005 .NET IDE menu, select Tools->Performance Tools->Performance Wizard...
[GSS is probably correct that you must have Team Edition]
This is simple example for testing code speed. I hope I helped you
class Program {
static void Main(string[] args) {
const int steps = 10000;
Stopwatch sw = new Stopwatch();
ArrayList list1 = new ArrayList();
sw.Start();
for(int i = 0; i < steps; i++) {
list1.Add(i);
}
sw.Stop();
Console.WriteLine("ArrayList:\tMilliseconds = {0},\tTicks = {1}", sw.ElapsedMilliseconds, sw.ElapsedTicks);
MyList list2 = new MyList();
sw.Start();
for(int i = 0; i < steps; i++) {
list2.Add(i);
}
sw.Stop();
Console.WriteLine("MyList: \tMilliseconds = {0},\tTicks = {1}", sw.ElapsedMilliseconds, sw.ElapsedTicks);
Related
I make one method who doing some simple operations like +, -, *, /.
I need to run this method 1513 times.
Here I try to run this method only once. To see do is working good and how times is be needed for to finish with operations.
Stopwatch st = new Stopwatch();
st.Start();
DiagramValue dv = new DiagramValue();
double pixel = dv.CalculateYPixel(23.46, diction);
st.Stop();
When is stop the stopwatch is teling me the time is 0.06s.
When I run the same method 1513 times in for loop like that:
Stopwatch st = new Stopwatch();
st.Start();
for (int i = 0; i < 1513; i++)
{
DiagramValue dv = new DiagramValue();
double pixel = dv.CalculateYPixel(23.46, diction);
}
st.Stop();
Then the Stopwatch is tell me is working around 0.14s. Or 0.14s / 1513 times = 0.00009s for one time.
My question is why If I running some method only once is too slow and if I running around thousand times in for loop is almost the same time.
Writing benchmarks is hard.
First, Stopwatch isn't infinitely accurate. When you run the method just once, you're very much limited by the accuracy of the underlying stopwatch. On the other hand, running the method multiple times alleviates this - you can get arbitrary precision by using a big enough loop. Instead of 1 vs 1513, compare e.g. 1500 vs. 3000. You'll get around 100% time increase, as expected.
Second, there's usually some cost with the first call in particular (e.g. JIT compilation) or with the memory pressure at the time of the call. That's why you usually need to do "preheating" - run the method outside of the stopwatch first to isolate these, and measure (multiple invocations) later.
Third, in a garbage collected environment like .NET, the guy who ordered the beer isn't necessarily the guy who pays the bill. Most of the cost of memory allocation in .NET is in the collection, rather than the allocation itself (which is about as cheap as a stack allocation). The collection usually happens outside of the code that caused the allocations in the first place, pointing you in the entirely wrong direction when searching for performance issues. That's why most .NET memory trackers display garbage collection separately - it's important to take account of, but can easily mislead you as to the cause if you're not careful.
There's many more issues, but these should cover your particular scenario well enough.
Some possible reasons include:
Timing resolution. You get a more accurate figure when you find the mean over a large number of iterations.
Noise. The percentage of stuff that isn't what you actually want to record, will be different.
Jitting. .NET will create code the first time a method is used. As such the first time it is run in a programs lifetime, the longer it will take, by a large factor (try running it once and then measuring the second attempt).
Branch prediction. If you keep doing the same thing with the same data the CPU's branch predictor is going to get better at predicting which branches are takken.
GC stability. Not likely in this case, but possible. Often at the start of a set of operations that requires particular objects to be created and then released the program ends up having to get more memory from the OS. When it's a bit into that set of operations it's more likely to have reached a steady state where it can just get that memory by cleaning out objects it isn't using any more, which is faster.
We have created a monitoring application for our enterprise app that will monitor our applications Performance counters. We monitor a couple system counters (memory, cpu) and 10 or so of our own custom performance counters. We have 7 or 8 exes that we monitor, so we check 80 counters every couple seconds.
Everything works great except when we loop over the counters the cpu takes a hit, 15% or so on my pretty good machine but on other machines we have seen it much higher. We are wanting our monitoring app to run discretely in the background looking for issues, not eating up a significant amount of the cpu.
This can easily be reproduced by this simple c# class. This loads all processes and gets Private Bytes for each. My machine has 150 processes. CallNextValue Takes 1.4 seconds or so and 16% cpu
class test
{
List<PerformanceCounter> m_counters = new List<PerformanceCounter>();
public void Load()
{
var processes = System.Diagnostics.Process.GetProcesses();
foreach (var p in processes)
{
var Counter = new PerformanceCounter();
Counter.CategoryName = "Process";
Counter.CounterName = "Private Bytes";
Counter.InstanceName = p.ProcessName;
m_counters.Add(Counter);
}
}
private void CallNextValue()
{
foreach (var c in m_counters)
{
var x = c.NextValue();
}
}
}
Doing this same thing in Perfmon.exe in windows and adding the counter Process - Private Bytes with all processes selected I see virtually NO cpu taken up and it's also graphing all processes.
So how is Perfmon getting the values? Is there a better/different way to get these performance counters in c#?
I've tried using RawValue instead of NextValue and i don't see any difference.
I've played around with Pdh call in c++ (PdhOpenQuery, PdhCollectQueryData, ...). My first tests don't seem like these are any easier on the cpu but i haven't created a good sample yet.
I'm not very familiar with the .NET performance counter API, but I have a guess about the issue.
The Windows kernel doesn't actually have an API to get detailed information about just one process. Instead, it has an API that can be called to "get all the information about all the processes". It's a fairly expensive API call. Every time you do c.NextValue() for one of your counters, the system makes that API call, throws away 99% of the data, and returns the data about the single process you asked about.
PerfMon.exe uses the same PDH APIs, but it uses a wildcard query -- it creates a single query that gets data for all of the processes at once, so it essentially only calls c.NextValue() once every second instead of calling it N times (where N is the number of processes). It gets a huge chunk of data back (data for all of the processes), but it's relatively cheap to scan through that data.
I'm not sure that the .NET performance counter API supports wildcard queries. The PDH API does, and it would be much cheaper to perform one wildcard query than to perform a whole bunch of single-instance queries.
Sorry for a long response, but I've found your question only now. Anyway, if anyone will need additional help, I have a solution:
I've made a little research on my custom process and I've understood that when we have a code snippet like
PerformanceCounter ourPC = new PerformanceCounter("Process", "% Processor time", "processname", true);
ourPC.NextValue();
Then our performance counter's NextValue() will show you the (number of logical cores * task manager cpu load of the process) value which is kind of logical thing, I suppose.
So, your problem may be that you have a slight CPU load in the task manager because it understands that you have a multiple core CPU, although the performance counter counts it by the formula above.
I see a one (kind of crutchy) possible solution for your problem so your code should be rewritten like this:
private void CallNextValue()
{
foreach (var c in m_counters)
{
var x = c.NextValue() / Environment.ProcessorCount;
}
}
Anyway, I do not recommend you to use Environment.ProcessorCount although I've used it: I just didn't want to add too much code to my short snippet.
You can see a good way to find out how much logical cores (yeah, if you have core i7, for example, you'll have to count logical cores, not physical) do you have in a system if you'll follow this link:
How to find the Number of CPU Cores via .NET/C#?
Good luck!
I try to capture the exact execution time of function
Stopwatch regularSW = new Stopwatch();
for (int i = 0; i < 10; i++) {
regularSW.Start();
//function();
regularSW.Stop();
Console.WriteLine("Measured time: " + regularSW.Elapsed);
}
I also tried with DateTime and Process.GetCurrentProcess().TotalProcessorTime
but each time I get a different value.
How i can get same value ?
With StopWatch you already use the most accurate way. But you are not re-starting it in the loop. It always starts at the value where it ended. You either have to create a new StopWatch or call StopWatch.Restart instead of Start:
Stopwatch regularSW = new Stopwatch();
for (int i = 0; i < 10; i++) {
regularSW.Restart();
//function();
regularSW.Stop();
Console.WriteLine("Measured time: " + regularSW.Elapsed);
}
That's the reason for the different values. If you now still get different values, then the reason is that the method function really has different execution times which is not that unlikely(f.e. if it's a database query).
Since this question seems to be largely theoretical(regarding your comments), consider following things if you want to measure time in .NET:
compile and run in release mode, Any CPU (on an x64 machine) and optimizations on
A tick is 0.0001 milliseconds, so don't overestimate your results
They are different because you cannot control what other operations your system might need to perform in the background while your C# progam is running
If you for example claim memory in the method because you fill a local list, then the garbage collector might attempt to reclaim garbage(memory)
C# code is compiled Just In Time. The first time you go through a loop can therefore be hundreds or thousands of times more expensive than every subsequent time due to the cost of the jitter analyzing the code that the loop calls. If you are intending on measuring the "warm" cost of a loop then you need to run the loop once before you start timing it. If you are intending on measuring the average cost including the jit time then you need to decide how many times makes up a reasonable number of trials, so that the average works out correctly
you are running your code in a multithreaded, multiprocessor environment where threads can be switched at will, and where the thread quantum (the amount of time the operating system will give another thread until yours might get a chance to run again) is about 16 milliseconds. 16 milliseconds is about fifty million processor cycles. Coming up with accurate timings of sub-millisecond operations can be quite difficult if the thread switch happens within one of the several million processor cycles that you are trying to measure. Take that into consideration.
The last two points were copied from this answer of Eric Lippert (worth reading).
My question consists of 2 parts:
Is there any good way in C# to measure computation effort other than using timers such as Stopwatch? Below is what I have been doing, but the granularity is not great, and the result returned varies every time. I am wondering if there is more precise measure such as CPU operation count so that the result returned can be consistent.
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
//do work
stopWatch.Stop();
TimeSpan ts = stopWatch.Elapsed;
Console.WriteLine(ts);
If the alternative approach in 1 is not possible, how can I make the performance test result less variate? What are some factors that can make the result change? Would closing all other applications running help? (I did try it but there seems to be no significant effect.) How about running the test on a VM, sandbox, etc.?
(After typing the proceeding text I realized that I also have tried the Performance Analysis feature which comes with Visual Studio. The test result seems more coarse because of the sampling method it uses. So I also want to rule out that option)
You need to get a profiling tool. But you can use StopWatch more reliably if you run your tests in a loop multiple times but only take the results of the test if the garbage collection generation stays the same.
Like this:
var timespans = new List<TimeSpan>();
while (true)
{
var count = GC.CollectionCount(0);
var sw = Stopwatch.StartNew();
/* run test here */
sw.Stop();
if (count == GC.CollectionCount(0))
{
timespans.Add(sw.Elapsed);
}
if (timespans.Count == 100)
{
break;
}
}
That'll give you 100 tests where garbage collection didn't occur. The average is then pretty good to work from.
If you find that your tests never run without invoking a garbage collection then try working out the minimum number of GC's that get triggered and collect your time spans only when that number occurs.
You could query a system performance counter. The msdn doc for the System.Diagnostics.PerformanceCounter class has some examples. With this class you could query "\Process(your_process_name)\% Processor Time" for example. It's an alternative to Stopwatch but tbh I think just using stopwatch and averaging many runs over time is a perfectly good way to go.
If what you need is a higher resolution stopwatch because you are trying to measure a very small slice of cpu time, then you may be interested in the High-Performance Counter.
this is a two part question, I wanted to post my code here on stack to help others with the same task.
Question 1:
I have a subset of code, which I believe, is correctly measuring CPU usage (across as many cores in the system, as per times retrieved) as per the measurement interval - I use 1 second in the thread call.
I had to decipher this from the very few articles on the web and from C++ code. My question is, for question 1, is this correct what I have done?
Sometimes the value returned is a minus figure which is why I multiply by -1. Again, I am assuming, since there is very little documentation, that this is what I should be doing.
I have the following code:
public static class Processor
{
[DllImport("kernel32.dll", SetLastError = true)]
static extern bool GetSystemTimes(out ComTypes.FILETIME lpIdleTime, out ComTypes.FILETIME lpKernelTime, out ComTypes.FILETIME lpUserTime);
private static TimeSpan _sysIdleOldTs;
private static TimeSpan _sysKernelOldTs;
private static TimeSpan _sysUserOldTs;
static Processor()
{
}
public static void Test()
{
ComTypes.FILETIME sysIdle, sysKernel, sysUser;
if(GetSystemTimes(out sysIdle, out sysKernel, out sysUser))
{
TimeSpan sysIdleTs = GetTimeSpanFromFileTime(sysIdle);
TimeSpan sysKernelTs = GetTimeSpanFromFileTime(sysKernel);
TimeSpan sysUserTs = GetTimeSpanFromFileTime(sysUser);
TimeSpan sysIdleDiffenceTs = sysIdleTs.Subtract(_sysIdleOldTs);
TimeSpan sysKernelDiffenceTs = sysKernelTs.Subtract(_sysKernelOldTs);
TimeSpan sysUserDiffenceTs = sysUserTs.Subtract(_sysUserOldTs);
_sysIdleOldTs = sysIdleTs;
_sysKernelOldTs = sysKernelTs;
_sysUserOldTs = sysUserTs;
TimeSpan system = sysKernelDiffenceTs.Add(sysUserDiffenceTs);
Double cpuUsage = (((system.Subtract(sysIdleDiffenceTs).TotalMilliseconds) * 100) / system.TotalMilliseconds);
if (cpuUsage < 0)
{
Console.WriteLine("CPU: " + ((int) (cpuUsage)*-1) + "%");
}
else
{
Console.WriteLine("CPU: " + (int) (cpuUsage) + "%");
}
Console.WriteLine("");
}
else
{
Console.WriteLine("Couldn't get CPU usage!");
Console.WriteLine("");
}
}
private static TimeSpan GetTimeSpanFromFileTime(ComTypes.FILETIME time)
{
return TimeSpan.FromMilliseconds((((ulong)time.dwHighDateTime << 32) + (uint)time.dwLowDateTime) * 0.000001);
}
}
Question 2:
Is there anyway for me to sync a thread, in my program, with that of the Windows Task Manager, for the purpose of matching measurement figure e.g CPU Usage with the above code?
What I mean is, if you open Windows Task Manager, you will notice that it polls every second - which in reality it doesn't need to be less than that. What I want to do is match the timing with my thread.
So when Windows Task Manager polls, my thread polls.
Some notes:
I didn't want to use Performance Counters or .NET built in methods. In fact, I believe - from what I have read, .NET doesn't have methods for calculating the CPU usage on a machine, that Performance counters are required for this otherwise.
Performance counters have overhead and in addition make the GC grow, not to mention the delay in calling the next result. While my software does not need to be real-time performance I do need it to be as responsive and use as little CPU time as possible. The above code can be called and returned in less than a millisecond. In fact on my development machine, the time-span difference shows 0ms. I don't believe Performance Counters are as responsive.
In case you are curious, my software is gathering a number of items, CPU, Memory, Event Log items etc. of which these all need to be gathered and stored, in SQL CE, before the next poll, 1 second away. Each task, item, however is on its own thread to facilitate this.
Also, the code above is not optimized in anyway and you will notice I have yet to comment it also. The reason being is I want to make sure it is correct before optimization etc.
Update 1
As per a coment I made down the way, I removed the extra "System" timespan as it is not required and modified the line that retrieves the "CPU Usage" and cast it appropriately.
int cpuUsage = (int)(((sysKernelDifferenceTs.Add(sysUserDifferenceTs).Subtract(sysIdleDifferenceTs).TotalMilliseconds) * 100.00) / sysKernelDifferenceTs.Add(sysUserDifferenceTs).TotalMilliseconds);
Though I am still unsure of the formula. While it seems to be highly accurate it does on occasion return a minus figure which is why I multiply it by -1 if that is the case. After all, there is no such thing a -2% CPU usage etc.
Update 2
So I did a simple test using "System.Diagnostics.PerformanceCounter". While incredibly handy and does exactly what it is intended to do it does create overhead.
Here are my observations:
It took the Performance Counter that much longer to initialize. In the order of roughly three seconds longer on my i7 2.6 Ghz.
The performance counter also seemed to add on another approx 5MB of RAM usage simply by using it. What I mean by this is: With the code above ,my app maxes out at 7.5MB ram. With the performance counter it "starts" at 12.5MB.
Over the space of 5 seconds, where my thread ran 5 times - once per second, the memory of my app had grown by 1 MB and this increase is consistent with time, although it does level out, in my case anyway, 3-4MB above starting. So where my app is usually 7.5MB ram with the code above, the PC code leveled out at 16.5 MB ram - an increase of 9MB over the code above. Note: The code above does not cause this increase.
So, if your application was built in a manner where resource usage and timing is key I would suggest against using Performance counters because of these reasons. Otherwise go ahead as it works without all the mess.
As for my app, performance counters will be detrimental to my software's purpose.
I think you have a bug in your formula. You want to basically compute CPU usage as this:
CPU Usage = KernelTimeDiff + UserTimeDiff
--------------------------------------------
KernelTimeDiff + UserTimeDiff + IdleTimeDiff
Thus, a quick mod to your code as follows:
// TimeSpan system = sysKernelDiffenceTs.Add(sysUserDiffenceTs);
//Double cpuUsage = (((system.Subtract(sysIdleDiffenceTs).TotalMilliseconds) * 100) / system.TotalMilliseconds);
TimeSpan totaltime = sysKernelDiffenceTs.Add(sysUserDiffenceTs);
totaltime = totaltime.Add(sysIdleDifferenceTs);
int cpuUsage = 100 - (sysIdleDifferenceTs.TotalMilliseconds * 100) / totaltime.TotalMilliseconds;
Console.WriteLine("CPU: " + cpuUsage + "%");
You originally declared cpuUsage as "Double". I'm not sure if you wanted floating point precision, but in your code, you definitely weren't getting anything other than integer precision because the assignment statement was just doing integer math. If you need higher precision from the computation, you could easily get it by mixing in some floating point:
Double cpuUsage = 100.0 - (sysIdleDifferenceTs.TotalMilliseconds * 100.0) /totaltime.TotalMilliseconds;
Also, in regards to being in sync with Task Manager. Task Manager, as I understand it, uses perf counters. (And I would suspect that GetSystemTimes is making perf counter calls under the hood, but perhaps not). And I'm not sure why you wouldn't use perf counters either. The "% Process Time" counter is an instant sample counter that doesn't require computing a diff with a previous result. (There's one per logical cpu). Use the PDH helper functions instead of the legacy registry key apis to get at it. You can do this from an unmanaged C/C++ DLL that exports a "GetCpuUsage" function back to your C# code. But I don't know why you couldn't just PInvoke the PDH functions from C# either. I don't know about this overhead that you speak of. I'm not sure I understand your reference to " the delay in calling the next result" either.