this is a two part question, I wanted to post my code here on stack to help others with the same task.
Question 1:
I have a subset of code, which I believe, is correctly measuring CPU usage (across as many cores in the system, as per times retrieved) as per the measurement interval - I use 1 second in the thread call.
I had to decipher this from the very few articles on the web and from C++ code. My question is, for question 1, is this correct what I have done?
Sometimes the value returned is a minus figure which is why I multiply by -1. Again, I am assuming, since there is very little documentation, that this is what I should be doing.
I have the following code:
public static class Processor
{
[DllImport("kernel32.dll", SetLastError = true)]
static extern bool GetSystemTimes(out ComTypes.FILETIME lpIdleTime, out ComTypes.FILETIME lpKernelTime, out ComTypes.FILETIME lpUserTime);
private static TimeSpan _sysIdleOldTs;
private static TimeSpan _sysKernelOldTs;
private static TimeSpan _sysUserOldTs;
static Processor()
{
}
public static void Test()
{
ComTypes.FILETIME sysIdle, sysKernel, sysUser;
if(GetSystemTimes(out sysIdle, out sysKernel, out sysUser))
{
TimeSpan sysIdleTs = GetTimeSpanFromFileTime(sysIdle);
TimeSpan sysKernelTs = GetTimeSpanFromFileTime(sysKernel);
TimeSpan sysUserTs = GetTimeSpanFromFileTime(sysUser);
TimeSpan sysIdleDiffenceTs = sysIdleTs.Subtract(_sysIdleOldTs);
TimeSpan sysKernelDiffenceTs = sysKernelTs.Subtract(_sysKernelOldTs);
TimeSpan sysUserDiffenceTs = sysUserTs.Subtract(_sysUserOldTs);
_sysIdleOldTs = sysIdleTs;
_sysKernelOldTs = sysKernelTs;
_sysUserOldTs = sysUserTs;
TimeSpan system = sysKernelDiffenceTs.Add(sysUserDiffenceTs);
Double cpuUsage = (((system.Subtract(sysIdleDiffenceTs).TotalMilliseconds) * 100) / system.TotalMilliseconds);
if (cpuUsage < 0)
{
Console.WriteLine("CPU: " + ((int) (cpuUsage)*-1) + "%");
}
else
{
Console.WriteLine("CPU: " + (int) (cpuUsage) + "%");
}
Console.WriteLine("");
}
else
{
Console.WriteLine("Couldn't get CPU usage!");
Console.WriteLine("");
}
}
private static TimeSpan GetTimeSpanFromFileTime(ComTypes.FILETIME time)
{
return TimeSpan.FromMilliseconds((((ulong)time.dwHighDateTime << 32) + (uint)time.dwLowDateTime) * 0.000001);
}
}
Question 2:
Is there anyway for me to sync a thread, in my program, with that of the Windows Task Manager, for the purpose of matching measurement figure e.g CPU Usage with the above code?
What I mean is, if you open Windows Task Manager, you will notice that it polls every second - which in reality it doesn't need to be less than that. What I want to do is match the timing with my thread.
So when Windows Task Manager polls, my thread polls.
Some notes:
I didn't want to use Performance Counters or .NET built in methods. In fact, I believe - from what I have read, .NET doesn't have methods for calculating the CPU usage on a machine, that Performance counters are required for this otherwise.
Performance counters have overhead and in addition make the GC grow, not to mention the delay in calling the next result. While my software does not need to be real-time performance I do need it to be as responsive and use as little CPU time as possible. The above code can be called and returned in less than a millisecond. In fact on my development machine, the time-span difference shows 0ms. I don't believe Performance Counters are as responsive.
In case you are curious, my software is gathering a number of items, CPU, Memory, Event Log items etc. of which these all need to be gathered and stored, in SQL CE, before the next poll, 1 second away. Each task, item, however is on its own thread to facilitate this.
Also, the code above is not optimized in anyway and you will notice I have yet to comment it also. The reason being is I want to make sure it is correct before optimization etc.
Update 1
As per a coment I made down the way, I removed the extra "System" timespan as it is not required and modified the line that retrieves the "CPU Usage" and cast it appropriately.
int cpuUsage = (int)(((sysKernelDifferenceTs.Add(sysUserDifferenceTs).Subtract(sysIdleDifferenceTs).TotalMilliseconds) * 100.00) / sysKernelDifferenceTs.Add(sysUserDifferenceTs).TotalMilliseconds);
Though I am still unsure of the formula. While it seems to be highly accurate it does on occasion return a minus figure which is why I multiply it by -1 if that is the case. After all, there is no such thing a -2% CPU usage etc.
Update 2
So I did a simple test using "System.Diagnostics.PerformanceCounter". While incredibly handy and does exactly what it is intended to do it does create overhead.
Here are my observations:
It took the Performance Counter that much longer to initialize. In the order of roughly three seconds longer on my i7 2.6 Ghz.
The performance counter also seemed to add on another approx 5MB of RAM usage simply by using it. What I mean by this is: With the code above ,my app maxes out at 7.5MB ram. With the performance counter it "starts" at 12.5MB.
Over the space of 5 seconds, where my thread ran 5 times - once per second, the memory of my app had grown by 1 MB and this increase is consistent with time, although it does level out, in my case anyway, 3-4MB above starting. So where my app is usually 7.5MB ram with the code above, the PC code leveled out at 16.5 MB ram - an increase of 9MB over the code above. Note: The code above does not cause this increase.
So, if your application was built in a manner where resource usage and timing is key I would suggest against using Performance counters because of these reasons. Otherwise go ahead as it works without all the mess.
As for my app, performance counters will be detrimental to my software's purpose.
I think you have a bug in your formula. You want to basically compute CPU usage as this:
CPU Usage = KernelTimeDiff + UserTimeDiff
--------------------------------------------
KernelTimeDiff + UserTimeDiff + IdleTimeDiff
Thus, a quick mod to your code as follows:
// TimeSpan system = sysKernelDiffenceTs.Add(sysUserDiffenceTs);
//Double cpuUsage = (((system.Subtract(sysIdleDiffenceTs).TotalMilliseconds) * 100) / system.TotalMilliseconds);
TimeSpan totaltime = sysKernelDiffenceTs.Add(sysUserDiffenceTs);
totaltime = totaltime.Add(sysIdleDifferenceTs);
int cpuUsage = 100 - (sysIdleDifferenceTs.TotalMilliseconds * 100) / totaltime.TotalMilliseconds;
Console.WriteLine("CPU: " + cpuUsage + "%");
You originally declared cpuUsage as "Double". I'm not sure if you wanted floating point precision, but in your code, you definitely weren't getting anything other than integer precision because the assignment statement was just doing integer math. If you need higher precision from the computation, you could easily get it by mixing in some floating point:
Double cpuUsage = 100.0 - (sysIdleDifferenceTs.TotalMilliseconds * 100.0) /totaltime.TotalMilliseconds;
Also, in regards to being in sync with Task Manager. Task Manager, as I understand it, uses perf counters. (And I would suspect that GetSystemTimes is making perf counter calls under the hood, but perhaps not). And I'm not sure why you wouldn't use perf counters either. The "% Process Time" counter is an instant sample counter that doesn't require computing a diff with a previous result. (There's one per logical cpu). Use the PDH helper functions instead of the legacy registry key apis to get at it. You can do this from an unmanaged C/C++ DLL that exports a "GetCpuUsage" function back to your C# code. But I don't know why you couldn't just PInvoke the PDH functions from C# either. I don't know about this overhead that you speak of. I'm not sure I understand your reference to " the delay in calling the next result" either.
Related
We have created a monitoring application for our enterprise app that will monitor our applications Performance counters. We monitor a couple system counters (memory, cpu) and 10 or so of our own custom performance counters. We have 7 or 8 exes that we monitor, so we check 80 counters every couple seconds.
Everything works great except when we loop over the counters the cpu takes a hit, 15% or so on my pretty good machine but on other machines we have seen it much higher. We are wanting our monitoring app to run discretely in the background looking for issues, not eating up a significant amount of the cpu.
This can easily be reproduced by this simple c# class. This loads all processes and gets Private Bytes for each. My machine has 150 processes. CallNextValue Takes 1.4 seconds or so and 16% cpu
class test
{
List<PerformanceCounter> m_counters = new List<PerformanceCounter>();
public void Load()
{
var processes = System.Diagnostics.Process.GetProcesses();
foreach (var p in processes)
{
var Counter = new PerformanceCounter();
Counter.CategoryName = "Process";
Counter.CounterName = "Private Bytes";
Counter.InstanceName = p.ProcessName;
m_counters.Add(Counter);
}
}
private void CallNextValue()
{
foreach (var c in m_counters)
{
var x = c.NextValue();
}
}
}
Doing this same thing in Perfmon.exe in windows and adding the counter Process - Private Bytes with all processes selected I see virtually NO cpu taken up and it's also graphing all processes.
So how is Perfmon getting the values? Is there a better/different way to get these performance counters in c#?
I've tried using RawValue instead of NextValue and i don't see any difference.
I've played around with Pdh call in c++ (PdhOpenQuery, PdhCollectQueryData, ...). My first tests don't seem like these are any easier on the cpu but i haven't created a good sample yet.
I'm not very familiar with the .NET performance counter API, but I have a guess about the issue.
The Windows kernel doesn't actually have an API to get detailed information about just one process. Instead, it has an API that can be called to "get all the information about all the processes". It's a fairly expensive API call. Every time you do c.NextValue() for one of your counters, the system makes that API call, throws away 99% of the data, and returns the data about the single process you asked about.
PerfMon.exe uses the same PDH APIs, but it uses a wildcard query -- it creates a single query that gets data for all of the processes at once, so it essentially only calls c.NextValue() once every second instead of calling it N times (where N is the number of processes). It gets a huge chunk of data back (data for all of the processes), but it's relatively cheap to scan through that data.
I'm not sure that the .NET performance counter API supports wildcard queries. The PDH API does, and it would be much cheaper to perform one wildcard query than to perform a whole bunch of single-instance queries.
Sorry for a long response, but I've found your question only now. Anyway, if anyone will need additional help, I have a solution:
I've made a little research on my custom process and I've understood that when we have a code snippet like
PerformanceCounter ourPC = new PerformanceCounter("Process", "% Processor time", "processname", true);
ourPC.NextValue();
Then our performance counter's NextValue() will show you the (number of logical cores * task manager cpu load of the process) value which is kind of logical thing, I suppose.
So, your problem may be that you have a slight CPU load in the task manager because it understands that you have a multiple core CPU, although the performance counter counts it by the formula above.
I see a one (kind of crutchy) possible solution for your problem so your code should be rewritten like this:
private void CallNextValue()
{
foreach (var c in m_counters)
{
var x = c.NextValue() / Environment.ProcessorCount;
}
}
Anyway, I do not recommend you to use Environment.ProcessorCount although I've used it: I just didn't want to add too much code to my short snippet.
You can see a good way to find out how much logical cores (yeah, if you have core i7, for example, you'll have to count logical cores, not physical) do you have in a system if you'll follow this link:
How to find the Number of CPU Cores via .NET/C#?
Good luck!
I try to capture the exact execution time of function
Stopwatch regularSW = new Stopwatch();
for (int i = 0; i < 10; i++) {
regularSW.Start();
//function();
regularSW.Stop();
Console.WriteLine("Measured time: " + regularSW.Elapsed);
}
I also tried with DateTime and Process.GetCurrentProcess().TotalProcessorTime
but each time I get a different value.
How i can get same value ?
With StopWatch you already use the most accurate way. But you are not re-starting it in the loop. It always starts at the value where it ended. You either have to create a new StopWatch or call StopWatch.Restart instead of Start:
Stopwatch regularSW = new Stopwatch();
for (int i = 0; i < 10; i++) {
regularSW.Restart();
//function();
regularSW.Stop();
Console.WriteLine("Measured time: " + regularSW.Elapsed);
}
That's the reason for the different values. If you now still get different values, then the reason is that the method function really has different execution times which is not that unlikely(f.e. if it's a database query).
Since this question seems to be largely theoretical(regarding your comments), consider following things if you want to measure time in .NET:
compile and run in release mode, Any CPU (on an x64 machine) and optimizations on
A tick is 0.0001 milliseconds, so don't overestimate your results
They are different because you cannot control what other operations your system might need to perform in the background while your C# progam is running
If you for example claim memory in the method because you fill a local list, then the garbage collector might attempt to reclaim garbage(memory)
C# code is compiled Just In Time. The first time you go through a loop can therefore be hundreds or thousands of times more expensive than every subsequent time due to the cost of the jitter analyzing the code that the loop calls. If you are intending on measuring the "warm" cost of a loop then you need to run the loop once before you start timing it. If you are intending on measuring the average cost including the jit time then you need to decide how many times makes up a reasonable number of trials, so that the average works out correctly
you are running your code in a multithreaded, multiprocessor environment where threads can be switched at will, and where the thread quantum (the amount of time the operating system will give another thread until yours might get a chance to run again) is about 16 milliseconds. 16 milliseconds is about fifty million processor cycles. Coming up with accurate timings of sub-millisecond operations can be quite difficult if the thread switch happens within one of the several million processor cycles that you are trying to measure. Take that into consideration.
The last two points were copied from this answer of Eric Lippert (worth reading).
I am currently trying to write an application that runs the same code exactly 100 times a second. I have done some testing using the built-in timers of the .NET framework. I've tested the System.Threading.Timer class, the System.Windows.Forms.Timer class and the System.Timers.Timer class. None of them seem to be accurate enough for what I am trying to do.
I've found out about PerformanceCounters and am currently trying to implement such in my application.
However, I am having a bit of trouble with my program taking up a whole core of my CPU when idling.
I only need it to be active 100 times a second. My loop looks like this:
long nextTick, nextMeasure;
QueryPerformanceCounter(out start);
nextTick = start + countsPerTick;
nextMeasure = start + performanceFrequency;
long currentCount;
while (true)
{
QueryPerformanceCounter(out currentCount);
if (currentCount >= nextMeasure)
{
Debug.Print("Ticks this second: " + tickCount);
tickCount = 0;
nextMeasure += performanceFrequency;
}
if (currentCount >= nextTick)
{
Calculations();
tickCount++;
nextTick += countsPerTick;
}
}
As you can see, most of the time the program will be waiting to run Calculations() again by running through the while loop constantly. Is there a way to stop this from happening? I don't want to slow the computers my program will be run on down.
System.Thread.Thread.Sleep unfortunately is also pretty "inaccurate", but I would be okay with using it if there is no other solution.
What I am basically asking is this: Is there a way to make an infinite loop less CPU-intensive? Is there any other way of accurately waiting for a specific amount of time?
As I'm sure you're aware, Windows is not a real-time O/S, so there can never be any guarantee that your code will run as often as you want.
Having said that, the most efficient in terms of yielding to other threads is probably to use Thread.Sleep() as the timer. If you want higher accuracy than the default you can issue a timeBeginPeriod with the desired resolution down to a millisecond. The function must be DLLImported from winmm.dll.
timeBeginPeriod(1) together with a normal timer or Thread.Sleep should work decently.
Note that this has a global effect. There are claims that it increases power consumption, since it forces the windows timer to run more often, shortening the CPU sleeping periods. This means you should generally avoid it. But if you need highly accurate timing, it's certainly a better choice than a busy-wait.
I'm developing an app which one scans thousands copies of a struct; ~1 GB RAM. Speed is important.
ParallelScan(_from, _to); //In a new thread
I manually adjust the threads count:
if (myStructs.Count == 0) { threads = 0; }
else if (myStructs.Count < 1 * Number.Thousand) { threads = 1; }
else if (myStructs.Count < 3 * Number.Thousand) { threads = 2; }
else if (myStructs.Count < 5 * Number.Thousand) { threads = 4; }
else if (myStructs.Count < 10 * Number.Thousand) { threads = 8; }
else if (myStructs.Count < 20 * Number.Thousand) { threads = 12; }
else if (myStructs.Count < 30 * Number.Thousand) { threads = 20; }
else if (myStructs.Count < 50 * Number.Thousand) { threads = 30; }
else threads = 40;
I just wrote it from scratch and I need to modify it for another CPU, etc. I think I could write a smarter code which one dynamically starts a new thread if CPU is available at the moment:
If CPU is not %100 start N thread
Measure CPU or thread process time & modify/estimate N
Loop until scan all struct array
Is there anyone think that "I did something similar" or "I have a better idea" ?
UPDATE: The solution
Parallel.For(0, myStructs.Count - 1, (x) =>
{
ParallelScan(x, x); // Will be ParallelScan(x);
});
I did trim tons of code. Thanks people!
UPDATE 2: Results
Scan time for 10K templates
1 Thread: 500 ms
10 Threads: 300 ms
40 Threads: 600 ms
Tasks: 100 ms
The standard answer: Use Tasks (TPL) , not Threads. Tasks require Fx4.
Your ParallelScan could just use Parallel.Foreach( ... ) or PLINQ (.AsParallel()).
The TPL framework includes a scheduler, and ForEach() uses a partitioner, to adapt to CPU cores and load. Your problem is most likely solved with the standard components but you can write custom-schedulers and -partitioners.
Actually, you won't get much benefit from spanning 50 threads, if you CPU only has two cores (even if each of them supports hyperthreading). If will actually run slower due to context switching which will occur every once in a while.
That means you should go for the Task Parallel Library (.NET 4), which takes care that all available cores are used efficiently.
Apart from that, improving the asymptotic duration of your search algorithm might prove more valuable for large quantities of data, regardless of the Moore's law.
[Edit]
If you are unable/unwilling to use .NET 4 TPL, you can start by getting the information about the current number of logical processors in the system (use Environment.ProcessorCount or check this answer for detailed info). Based on that number, you can partition your data and span a fixed number of threads. That is much simpler that checking the CPU utilization, and should prevent creating unnecessary threads which are starved anyway.
OK, sorry to keep going on but first to compile my comments:
Unless you have a very, very, very, good reason to think that scanning these structs will take any more than a handful of microseconds and that really, really, really matters, it's not a good idea to do this kind of optimisation. If you really want to do it, you should have one thread per core. But really - don't. If it's just 50,000 structs and you're doing something simple with them, don't bother.
FYI, starting a new thread takes a good amount of time (a measurable part of a second, several milliseconds).
How long does this operation take? It's very unlikely that it's useful for you to optimize multithreading like this. It will give you the worst improvement. Better improvement will be gained by a better algorithm, or not having to depend on this weird invented multithreading scheme.
I'm confused about your performance fixation partly because you say you're looking through 50,000 structs (a very quick and easy operation) and partly because you're using structs. Without boxing that's a value type and if you're passing them around threads you're copying data rather than references, i.e. using more memory. My point being that that's a lot of data/memory, unless the structs are small, in which case, what kind of processing can you possibly be doing on them that takes so long as to think about 40+ threads in parallel?
If performance is truly incredibly important and your goal, and you're not simply trying to do this as a nice engineering exercise, please share information about what kind of processing you're doing.
I'm a total newbie, but I was writing a little program that worked on strings in C# and I noticed that if I did a few things differently, the code executed significantly faster.
So it had me wondering, how do you go about clocking your code's execution speed? Are there any (free)utilities? Do you go about it the old-fashioned way with a System.Timer and do it yourself?
What you are describing is known as performance profiling. There are many programs you can get to do this such as Jetbrains profiler or Ants profiler, although most will slow down your application whilst in the process of measuring its performance.
To hand-roll your own performance profiling, you can use System.Diagnostics.Stopwatch and a simple Console.WriteLine, like you described.
Also keep in mind that the C# JIT compiler optimizes code depending on the type and frequency it is called, so play around with loops of differing sizes and methods such as recursive calls to get a feel of what works best.
ANTS Profiler from RedGate is a really nice performance profiler. dotTrace Profiler from JetBrains is also great. These tools will allow you to see performance metrics that can be drilled down the each individual line.
Scree shot of ANTS Profiler:
ANTS http://www.red-gate.com/products/ants_profiler/images/app/timeline_calltree3.gif
If you want to ensure that a specific method stays within a specific performance threshold during unit testing, I would use the Stopwatch class to monitor the execution time of a method one ore many times in a loop and calculate the average and then Assert against the result.
Just a reminder - make sure to compile in Relase, not Debug! (I've seen this mistake made by seasoned developers - it's easy to forget).
What are you describing is 'Performance Tuning'. When we talk about performance tuning there are two angle to it. (a) Response time - how long it take to execute a particular request/program. (b) Throughput - How many requests it can execute in a second. When we typically 'optimize' - when we eliminate unnecessary processing both response time as well as throughput improves. However if you have wait events in you code (like Thread.sleep(), I/O wait etc) your response time is affected however throughput is not affected. By adopting parallel processing (spawning multiple threads) we can improve response time but throughput will not be improved. Typically for server side application both response time and throughput are important. For desktop applications (like IDE) throughput is not important only response time is important.
You can measure response time by 'Performance Testing' - you just note down the response time for all key transactions. You can measure the throughput by 'Load Testing' - You need to pump requests continuously from sufficiently large number of threads/clients such that the CPU usage of server machine is 80-90%. When we pump request we need to maintain the ratio between different transactions (called transaction mix) - for eg: in a reservation system there will be 10 booking for every 100 search. there will be one cancellation for every 10 booking etc.
After identifying the transactions require tuning for response time (performance testing) you can identify the hot spots by using a profiler.
You can identify the hot spots for throughput by comparing the response time * fraction of that transaction. Assume in search, booking, cancellation scenario, ratio is 89:10:1.
Response time are 0.1 sec, 10 sec and 15 sec.
load for search - 0.1 * .89 = 0.089
load for booking- 10 * .1 = 1
load for cancell= 15 * .01= 0.15
Here tuning booking will yield maximum impact on throughput.
You can also identify hot spots for throughput by taking thread dumps (in the case of java based applications) repeatedly.
Use a profiler.
Ants (http://www.red-gate.com/Products/ants_profiler/index.htm)
dotTrace (http://www.jetbrains.com/profiler/)
If you need to time one specific method only, the Stopwatch class might be a good choice.
I do the following things:
1) I use ticks (e.g. in VB.Net Now.ticks) for measuring the current time. I subtract the starting ticks from the finished ticks value and divide by TimeSpan.TicksPerSecond to get how many seconds it took.
2) I avoid UI operations (like console.writeline).
3) I run the code over a substantial loop (like 100,000 iterations) to factor out usage / OS variables as best as I can.
You can use the StopWatch class to time methods. Remember the first time is often slow due to code having to be jitted.
There is a native .NET option (Team Edition for Software Developers) that might address some performance analysis needs. From the 2005 .NET IDE menu, select Tools->Performance Tools->Performance Wizard...
[GSS is probably correct that you must have Team Edition]
This is simple example for testing code speed. I hope I helped you
class Program {
static void Main(string[] args) {
const int steps = 10000;
Stopwatch sw = new Stopwatch();
ArrayList list1 = new ArrayList();
sw.Start();
for(int i = 0; i < steps; i++) {
list1.Add(i);
}
sw.Stop();
Console.WriteLine("ArrayList:\tMilliseconds = {0},\tTicks = {1}", sw.ElapsedMilliseconds, sw.ElapsedTicks);
MyList list2 = new MyList();
sw.Start();
for(int i = 0; i < steps; i++) {
list2.Add(i);
}
sw.Stop();
Console.WriteLine("MyList: \tMilliseconds = {0},\tTicks = {1}", sw.ElapsedMilliseconds, sw.ElapsedTicks);