How to run CPU at a given load (% CPU utilization)? - c#

Is it possible to freeze CPU usage that is shown in in Windows Task Manager? I wish to freeze the load as specific values like 20%, 50%, 70% etc. from my program.
(This is to analyse how much power the PC is consuming with regard to CPU usage.)
Is this possible?

My first naive attempt would be to spawn 2x threads as cores -- each thread in the highest priority and then, within each thread, run a busy-loop and do some work. (More threads than cores is to "steal" all the time I can get from other threads in windows :-)
Using some kind of API to read the CPU load (perhaps WMI or performance counters?) and I would then make each thread 'yield' from the busy loop (sleep for a certain amount of time each loop) until I get the approximate load in the feedback cycle.
This cycle would be self-adjusting: too high load, sleep more. Too low load, sleep less. It's not an exact science, but I think that with some tweaking a stable load can be obtained.
But, I have no idea, really :-)
Happy coding.
Also, consider power management -- sometimes it can lock a CPU at a "max %". Then fully load the CPU and it will max out at that limit. (Windows 7, at least, has a built-in feature to do this, depending upon CPU and chip-set -- there are likely many 3rd party tools.)
The situation becomes rather confusing with newer CPUs that dynamically clocked based on load and temperature, etc.
Here is my attempt at the "naive" approach for .NET 3.5. Make sure to include the System.Management reference.
The CPU utilization as reported by the Task Manager hovers within a few percent of the target -- average seems pretty darn close -- on my system. YMMV, but there is some flexibility for adjustment.
Happy coding (again).
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Management;
using System.Threading;
using System.Diagnostics;
namespace CPULoad
{
class Program
{
// What to try to get :-)
static int TargetCpuUtilization = 50;
// An average window too large results in bad harmonics -- keep it small.
static int AverageWindow = 5;
// A somewhat large number gets better results here.
static int ThreadsPerCore = 8;
// WMI is *very slow* compared to a PerformanceCounter.
// It still works, but each cycle is *much* longer and it doesn't
// exhibit as good of characteristics in maintaining a stable load.
// (It also seems to run a few % higher).
static bool UseWMI = false;
// Not sure if this helps -- but just play about :-)
static bool UseQuestionableAverage = true;
static int CoreCount () {
var sys = new ManagementObject("Win32_ComputerSystem.Name=\"" + Environment.MachineName + "\"");
return int.Parse("" + sys["NumberOfLogicalProcessors"]);
}
static Func<int> GetWmiSampler () {
var searcher = new ManagementObjectSearcher(
#"root\CIMV2",
"SELECT PercentProcessorTime FROM Win32_PerfFormattedData_PerfOS_Processor");
return () => {
var allCores = searcher.Get().OfType<ManagementObject>().First();
return int.Parse("" + allCores["PercentProcessorTime"]);
};
}
static Func<int> GetCounterSampler () {
var cpuCounter = new PerformanceCounter {
CategoryName = "Processor",
CounterName = "% Processor Time",
InstanceName = "_Total",
};
return () => {
return (int)cpuCounter.NextValue();
};
}
static Func<LinkedList<int>, int> StandardAverage () {
return (samples) => {
return (int)samples.Average();
};
}
// Bias towards newest samples
static Func<LinkedList<int>, int> QuestionableAverage () {
return (samples) => {
var weight = 4.0;
var sum = 0.0;
var max = 0.0;
foreach (var sample in samples) {
sum += sample * weight;
max += weight;
weight = Math.Min(4, Math.Max(1, weight * 0.8));
}
return (int)(sum / max);
};
}
static void Main (string[] args) {
var threadCount = CoreCount() * ThreadsPerCore;
var threads = new List<Thread>();
for (var i = 0; i < threadCount; i++) {
Console.WriteLine("Starting thread #" + i);
var thread = new Thread(() => {
Loader(
UseWMI ? GetWmiSampler() : GetCounterSampler(),
UseQuestionableAverage ? QuestionableAverage() : StandardAverage());
});
thread.IsBackground = true;
thread.Priority = ThreadPriority.Highest;
thread.Start();
threads.Add(thread);
}
Console.ReadKey();
Console.WriteLine("Fin!");
}
static void Loader (Func<int> nextSample, Func<LinkedList<int>, int> average) {
Random r = new Random();
long cycleCount = 0;
int cycleLength = 10;
int sleepDuration = 15;
int temp = 0;
var samples = new LinkedList<int>(new[] { 50 });
long totalSample = 0;
while (true) {
cycleCount++;
var busyLoops = cycleLength * 1000;
for (int i = 0; i < busyLoops; i++) {
// Do some work
temp = (int)(temp * Math.PI);
}
// Take a break
Thread.Sleep(sleepDuration);
{
// Add new sample
// This seems to work best when *after* the sleep/yield
var sample = nextSample();
if (samples.Count >= AverageWindow) {
samples.RemoveLast();
}
samples.AddFirst(sample);
totalSample += sample;
}
var avg = average(samples);
// should converge to 0
var conv = Math.Abs(TargetCpuUtilization - (int)(totalSample / cycleCount));
Console.WriteLine(string.Format("avg:{0:d2} conv:{1:d2} sleep:{2:d2} cycle-length:{3}",
avg, conv, sleepDuration, cycleLength));
// Manipulating both the sleep duration and work duration seems
// to have the best effect. We don't change both at the same
// time as that skews one with the other.
// Favor the cycle-length adjustment.
if (r.NextDouble() < 0.05) {
sleepDuration += (avg < TargetCpuUtilization) ? -1 : 1;
// Don't let sleep duration get unbounded upwards or it
// can cause badly-oscillating behavior.
sleepDuration = (int)Math.Min(24, Math.Max(0, sleepDuration));
} else {
cycleLength += (avg < TargetCpuUtilization) ? 1 : -1;
cycleLength = (int)Math.Max(5, cycleLength);
}
}
}
}
}
While Windows is a preemptive operating system, code which runs in Kernel Mode -- such as drivers -- is preempted by far less. While not doable in C# AFAIK, this should yield a method of stricter load control than the above, but also has a good bit more complexity (and the ability to crash the entire system :-)
There is Process.PriorityClass, but setting this to anything but normal yielded lest consistent behavior for me.

I don't know if you can do that, but you can change the thread priority of the executing thread via the Priority property. You would set that by:
Thread.CurrentThread.Priority = ThreadPriority.Lowest;
Also, I don't think you really want to cap it. If the machine is otherwise idle, you'd like it to get busy on with the task, right? ThreadPriority helps communicate this to the scheduler.
Reference : How to restrict the CPU usage a C# program takes?

Related

current CPU usage percentage Linux in C#

I'm trying to report a total CPU usage in a Linux system. It reports every 10 seconds. I have multiple processes running on several dockers so running Process.GetProcesses() got me an error of unable to retrieve the specified information about the process. it may have exited or may be privileged. My computer has 12 CPUs and 2 for every thread: meaning it is using 6 cores. I need an efficient way to calculate current CPU usage of all the computer at given time.
My attempts have been:
TimeSpan delta = DateTime.Now - _lastTimeBuildReport;
double cpuLoad = 100* (currentExporterDic["node_cpu_seconds_total"].GetSum("idle") - _previousExporterDic["node_cpu_seconds_total"].GetSum("idle")) / delta.TotalSeconds;
Which gave me strange values (3 times what is real)
And this which gave the error, after ASP.NET CORE LINUX Get CPU USAGE
private double CalculateToTotalCPUUsage(double totalMsPassed)
{
try
{
double totalCpuUsagePercentage = 0;
string getProccesorsCoresCMD = "grep 'cpu cores' /proc/cpuinfo | uniq | grep -Eo '[0-9]{1,4}'";
string[] coresLines = DotNetUtilities.DotNetUtilities.GetOsCmdLineOutput(getProccesorsCoresCMD).Lines;
int cpuCores = int.Parse(coresLines[0]);
Process[] allProc = Process.GetProcesses();
foreach (Process process in allProc)
{
// Start watching CPU
var currentProStartCpuUsage = process.TotalProcessorTime;
//Delay the function so we can measure the same CPU load
Task.Delay(Convert.ToInt32(totalMsPassed)).Wait();
var currentProEndCpuUsage = process.TotalProcessorTime;
var currentProCpuUsedMs = (currentProEndCpuUsage - currentProStartCpuUsage).TotalMilliseconds;
var currentProCpuUsageTotal = currentProCpuUsedMs / (cpuCores * totalMsPassed);
var currentProCpuUsagePercentage = currentProCpuUsageTotal * 100;
totalCpuUsagePercentage += currentProCpuUsagePercentage;
}
return totalCpuUsagePercentage;
}
catch (Exception ex)
{
Logger.LogError($"Couldn't calculate total CPU usage. Reason: {ex.Message} {ex.StackTrace}");
return -1;
}
}

Calculate CPU Usage in Percentage UWP Application Windows 10 IOT

I want to calculate CPU usage in percentage. Currently I am using ProcessDiagnosticInfo to get kernal time and user time. How can I convert this time to percentage or suggest me any other method to find it, if there is any.
private TimeSpan GetTotalCpuTime()
{
var totalKernelTime = new TimeSpan();
var totalUserTime = new TimeSpan();
var pdis = ProcessDiagnosticInfo.GetForProcesses();
foreach (var pdi in pdis)
{
var cpuUsage = pdi.CpuUsage;
var report = cpuUsage.GetReport();
totalKernelTime += report.KernelTime;
totalUserTime += report.UserTime;
}
return totalKernelTime + totalUserTime;
}
I also know Windows 10 IoT dashboard API "/api/resourcemanager/systemperf", it return System statistics which include CPU Usage in Percentage but credentials are required to access it, so I don't want to use it.
Each process spends some time in kernel mode and some time in user mode. It is important to note that we do NOT take into account the idle time.
Please refer to following code.
private static readonly Stopwatch Stopwatch = new Stopwatch();
private static TimeSpan _oldElapsed, _oldKernelTime, _oldUserTime;
private static int ProcessorCount { get; }
private static double _carryOver;
static CpuUsage()
{
// Stopwatch will be used to track how much time/usage has elapsed.
Stopwatch.Start();
// We'll divide the total used CPU time by the number of processors.
ProcessorCount = System.Environment.ProcessorCount;
// Run to store the initial "oldKernel/UserTime" so the first read
// isn't super inflated by the application's start-up.
GetTotalCpuTime();
}
/// <summary>
/// Returns the average percentage of CPU time used since the last time this call was made.
/// </summary>
/// <returns></returns>
private static TimeSpan GetTotalCpuTime()
{
// Because we could have more than one process running, add all of them up.
var totalKernelTime = new TimeSpan();
var totalUserTime = new TimeSpan();
// Grab the diagnostic infos for all existing processes.
var pdis = ProcessDiagnosticInfo.GetForProcesses();
foreach (var pdi in pdis)
{
var cpuUsage = pdi.CpuUsage;
var report = cpuUsage.GetReport();
totalKernelTime += report.KernelTime;
totalUserTime += report.UserTime;
}
// Subtract the amount of "Total CPU Time" that was previously calculated.
var elapsedKernelTime = totalKernelTime - _oldKernelTime;
var elapsedUserTime = totalUserTime - _oldUserTime;
// Track the "old" variables.
_oldKernelTime = totalKernelTime;
_oldUserTime = totalUserTime;
// Between both is all of the CPU time that's been consumed by the application.
return elapsedKernelTime + elapsedUserTime;
}
public static double GetPercentage()
{
// Because there's a small amount of time between when the "elapsed" is grabbed,
// and all of the process KernelTime and UserTimes are tallied, the overall CPU
// usage will be off by a fraction of a percent, but it's nominal. Like in the
// 0.001% range.
var elapsed = Stopwatch.Elapsed;
var elapsedTime = elapsed - _oldElapsed;
var elapsedCpuTime = GetTotalCpuTime();
// Divide the result by the amount of time that's elapsed since the last check to
// get the percentage of CPU time that has been consumed by this application.
var ret = elapsedCpuTime / elapsedTime / ProcessorCount * 100;
// Track the "old" variables.
_oldElapsed = elapsed;
// This part is completely optional. Because the thread could be called between
// the time that "elapsed" is grabbed, and the CPU times are calculated, this will
// cause a "pause" that results in spiking the "CPU usage time" over 100%. However
// on the next call, the difference will be "lost" (so if it the CPU percent was
// at 100% for two calls, but this 'pause' happened, one could report 150% while
// the next would report 50%.) By carrying over the values above 100%, we can get
// a slightly more accurate "average" usage.
ret += _carryOver;
if (ret > 100)
{
_carryOver = ret - 100;
ret = 100;
}
else
{
_carryOver = 0;
}
return ret;
}
Update:
You’ll need to declare the appDiagnostics and packageQuery capability in your manifest.
The appDiagnostics capability allows an app to get diagnostic
information.
The packageQuery device capability allows apps to
gather information about other apps.
*.appxmanifest:
<Capabilities>
<Capability Name="internetClient" />
<rescap:Capability Name="appDiagnostics" />
<rescap:Capability Name="packageQuery" />
</Capabilities>
Here is a blog about UWP App Diagnostics, hope that is helpful for you. In addition, you can refer to this sample.

How to achieve 100% CPU usage in multithreaded application?

I have ~100 text files 200MB each and I need to parse them. The program below loads files and processes them in parallel. It can create a Thread per file or a Process per file.
The problem: If I use threads it never uses 100% CPU and takes longer to complete.
THREAD PER FILE
total time: 430 sec
CPU usage 15-20%
CPU frequency 1.2 GHz
PROCESS PER FILE
total time 100 sec
CPU usage 100%
CPU frequency 3.75 GHz
I'm using E5-1650 v3 Hexa-Core with HT, therefore I process 12 files at a time.
How can I achive 100% CPU utilisation by threads?
Code below does not use result of processing since it doen not affect the problem.
using System;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Text;
using System.Threading;
namespace libsvm2tsv
{
class Program
{
static void Main(string[] args)
{
var sw = Stopwatch.StartNew();
switch (args[0])
{
case "-t": LoadAll(args[1], LoadFile); break;
case "-p": LoadAll(args[1], RunChild); break;
case "-f": LoadFile(args[1]); return;
}
Console.WriteLine("ELAPSED: {0} sec.", sw.ElapsedMilliseconds / 1000);
Console.ReadLine();
}
static void LoadAll(string folder, Action<string> algorithm)
{
var sem = new SemaphoreSlim(12);
Directory.EnumerateFiles(folder).ToList().ForEach(f=> {
sem.Wait();
new Thread(() => { try { algorithm(f); } finally { sem.Release(); } }).Start();
});
}
static void RunChild(string file)
{
Process.Start(new ProcessStartInfo
{
FileName = Assembly.GetEntryAssembly().Location,
Arguments = "-f \"" + file + "\"",
UseShellExecute = false,
CreateNoWindow = true
})
.WaitForExit();
}
static void LoadFile(string inFile)
{
using (var ins = File.OpenText(inFile))
while (ins.Peek() >= 0)
ParseLine(ins.ReadLine());
}
static long[] ParseLine(string line)
{
return line
.Split()
.Skip(1)
.Select(r => (long)(double.Parse(r.Split(':')[1]) * 1000))
.Select(r => r < 0 ? -1 : r)
.ToArray();
}
}
}
Finally, I've found the bottleneck. I'm using string.Split to parse numbers from every line of data, so I get billions short strings. These strings are put in heap. Since all threads share single heap memory allocation is synchronized. Since processes have separate heaps - no syncronization occures and things work fast. That's the root of issue. So, I rewrote parsing using IndexOf rather than Split and threads started to perform even better than separate processes. Just as I expected it to be.
Since .NET has no default tool to parse real numbers out of the certain position inside string I used this one: https://codereview.stackexchange.com/questions/75791/optimize-custom-double-parse with small modification.
using System;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Threading;
using System.Threading.Tasks;
namespace libsvm2tsv
{
class Program
{
static void Main(string[] args)
{
var sw = Stopwatch.StartNew();
switch (args[0])
{
case "-t": LoadAll(args[1], LoadFile); break;
case "-p": LoadAll(args[1], RunChild); break;
case "-f": LoadFile(args[1]); return;
}
Console.WriteLine("ELAPSED: {0} sec.", sw.ElapsedMilliseconds / 1000);
Console.ReadLine();
}
static void LoadAll(string folder, Action<string> algorithm)
{
Parallel.ForEach(
Directory.EnumerateFiles(folder),
new ParallelOptions { MaxDegreeOfParallelism = 12 },
f => algorithm(f));
}
static void RunChild(string file)
{
Process.Start(new ProcessStartInfo
{
FileName = Assembly.GetEntryAssembly().Location,
Arguments = "-f \"" + file + "\"",
UseShellExecute = false,
CreateNoWindow = true
})
.WaitForExit();
}
static void LoadFile(string inFile)
{
using (var ins = File.OpenText(inFile))
while (ins.Peek() >= 0)
ParseLine(ins.ReadLine());
}
static long[] ParseLine(string line)
{
// first, count number of items
var items = 1;
for (var i = 0; i < line.Length; i++)
if (line[i] == ' ') items++;
//allocate memory and parse items
var all = new long[items];
var n = 0;
var index = 0;
while (index < line.Length)
{
var next = line.IndexOf(' ', index);
if (next < 0) next = line.Length;
if (next > index)
{
var v = (long)(parseDouble(line, line.IndexOf(':', index) + 1, next - 1) * 1000);
if (v < 0) v = -1;
all[n++] = v;
}
index = next + 1;
}
return all;
}
private readonly static double[] pow10Cache;
static Program()
{
pow10Cache = new double[309];
double p = 1.0;
for (int i = 0; i < 309; i++)
{
pow10Cache[i] = p;
p /= 10;
}
}
static double parseDouble(string input, int from, int to)
{
long inputLength = to - from + 1;
long digitValue = long.MaxValue;
long output1 = 0;
long output2 = 0;
long sign = 1;
double multiBy = 0.0;
int k;
//integer part
for (k = 0; k < inputLength; ++k)
{
digitValue = input[k + from] - 48; // '0'
if (digitValue >= 0 && digitValue <= 9)
{
output1 = digitValue + (output1 * 10);
}
else if (k == 0 && digitValue == -3 /* '-' */)
{
sign = -1;
}
else if (digitValue == -2 /* '.' */ || digitValue == -4 /* ',' */)
{
break;
}
else
{
return double.NaN;
}
}
//decimal part
if (digitValue == -2 /* '.' */ || digitValue == -4 /* ',' */)
{
multiBy = pow10Cache[inputLength - (++k)];
for (; k < inputLength; ++k)
{
digitValue = input[k + from] - 48; // '0'
if (digitValue >= 0 && digitValue <= 9)
{
output2 = digitValue + (output2 * 10);
}
else
{
return Double.NaN;
}
}
multiBy *= output2;
}
return sign * (output1 + multiBy);
}
}
}
I have ~100 text files 200MB each and I need to parse them.
The fastest way to read or write data from/to a spinning disk is sequentially in order to minimize the time the disk heads need to seek to find data or write it to the specified location. So doing parallel IO to a single disk is going to slow IO rates down - and depending on the actual IO pattern it can slow rates down dramatically. A disk that can handle 100 MB/sec sequentially might only be able to move 20 or 30 kilobytes per second doing parallel reads/writes of small blocks of data.
Were I optimizing such a process, I wouldn't worry about CPU utilization first, I'd optimize IO throughput first. You are IO bound unless you're doing some really CPU-intensive parsing. Once your IO throughput is optimized, if you're getting 100% CPU utilization then you're CPU bound. If your design scales nicely, then you can add CPUs and probably run faster.
To speed up your IO, you first need to minimize disk seeks, especially if you're using consumer-grade, cheap SATA drives. There are multiple ways to do this.
First, the easiest - eliminate the disk heads. Put your data on SSDs. Problem solved without having to write complex, bug-prone optimized code. How much time will it take for you to make this run faster using software? You have to design something, test it, tune it, debug it, and importantly, keep it running and running well. None of that is free. One important cost is the opportunity cost of spending time making things go faster - when you're doing that, you're not solving any other problems. Faster hardware has none of those costs. In this case, buy the SSDs, plug them in, and you're faster.
But if you really want to spend several weeks or longer optimizing your processing software, here's how I'd go about it:
Spread the data over multiple disks. You can't do parallel IO to physical disks quickly as the disk head seeks will kill performance. So do as much of the reading and writing to different disks as possible.
For each disk, have a single reader or writer thread or process that feeds data to a worker pool or writes data provided by that worker pool.
A tunable number of worker threads/processes to do the actual parsing.
That way, you can read the files and write output data all sequentally and without contention on each disk from other IO processes.
I would consider replacing ForEach with Parallel.ForEach and remove your explicit use of Threads. Use https://stackoverflow.com/a/5512363/34092 to set the number of threads to limit it to.
static void LoadAll(string folder, Action<string> algorithm)
{
Parallel.ForEach(Directory.EnumerateFiles(folder), algorithm);
}
As others have stated IO will probably be a bottleneck in the end and getting 100% CPU usage is really irrelevant. I feel they are missing something, though: you do get higher throughput with processes than with threads and that means IO is not the only bottleneck. The reason is that the CPU runs with a higher frequency with processes and you want it to run at peak spead when it is not waiting for IO! So, how can you do that?
The easiest way is to set the power profile from the power options manually. Edit power options and set both minimum and maximum processor state to 100%. That should do the job.
If you want to do it from your program, have a look at How to Disable Dynamic Frequency Scaling?. There is probably a similar API for .NET without using native code, but I couldn't find it now.

C# - Performance Counter lower than Taskmanager % [duplicate]

This question already has answers here:
PerformanceCounter reporting higher CPU usage than what's observed
(2 answers)
Closed 2 years ago.
I try to make a class which will fetch different usages from the pc, the problem that I have atm is that the CPU usage is under what Task Manager displays (with about 10%).
Can you please have a look and point me in the right direction ? Please no answers without explanation, I want to learn !
Here is what I have atm :
using System.Diagnostics;
using System.Net.NetworkInformation;
namespace ConsoleApplication2
{
class UsageFetcher
{
ulong totalRAM;
PerformanceCounter cpuUsage;
PerformanceCounter ramUsage;
PerformanceCounter diskUsage;
NetworkInterface[] networkUsage;
public UsageFetcher()
{
// Fetching total amount of RAM to be able to determine used persantage
//totalRAM = new Microsoft.VisualBasic.Devices.ComputerInfo().TotalPhysicalMemory;
totalRAM = this.getTotalRam();
// Creating a new Perfromance Counter who will be used to get the CPU Usage
cpuUsage = new PerformanceCounter();
// Setting it up to fetch CPU Usage
cpuUsage.CategoryName = "Processor";
cpuUsage.CounterName = "% Processor Time";
cpuUsage.InstanceName = "_Total";
/*
* Fetching the first two reads
* First read is always 0 so we must elimiate it
*/
cpuUsage.NextValue();
cpuUsage.NextValue();
// Creating a new Performance Counter who will be used to get the Memory Usage
ramUsage = new PerformanceCounter();
// Setting it up to fetch Memory Usage
ramUsage.CategoryName = "Memory";
ramUsage.CounterName = "Available Bytes";
// Fetching the first two reads !! Same reason as above !!
ramUsage.NextValue();
ramUsage.NextValue();
}
public string getCPUUsage()
{
/*
* Requesting the usage of the CPU
* It is returned as a float thus I need to call ToString()
*/
return cpuUsage.NextValue().ToString();
}
public string getMemUsage()
{
// Requesting memory usage and calculate how much is free
return (100 -ramUsage.NextValue() / totalRAM * 100).ToString();
}
public ulong getTotalRam()
{
return new Microsoft.VisualBasic.Devices.ComputerInfo().TotalPhysicalMemory ;
}
}
}
According to this SO post here: Why the cpu performance counter kept reporting 0% cpu usage?
You need to sleep for at least a second for the NextValue() method to return a decent result.
Try adding a call to Sleep between your calls to NextValue and see what you get.

.net code slower on AMD Opteron CPU

Have encounterred a situation where a simple .net fibonniacci code is slower on a particular set of servers and the only thing that is obviously different is the CPU.
AMD Opteron Processor 6276 - 11 secs
Intel Xeon XPU E7 - 4850 - 7 secs
Code is complied for x86 and using .NET framework 4.0.
-Clock speeds between both is similar and in fact PassMark benchmarks gives highesr scores for AMD.
-Have tried this on other AMD servers in the farm and the times are slower.
-Even my local I7 machines runs the code faster.
Fibonnacci code:
class Program
{
static void Main(string[] args)
{
const int ITERATIONS = 10000;
const int FIBONACCI = 100000;
var watch = new Stopwatch();
watch.Start();
DoFibonnacci(ITERATIONS, FIBONACCI);
watch.Stop();
Console.WriteLine("Total fibonacci time: {0}ms", watch.ElapsedMilliseconds);
Console.ReadLine();
}
private static void DoFibonnacci(int ITERATIONS, int FIBONACCI)
{
for (int i = 0; i < ITERATIONS; i++)
{
Fibonacci(FIBONACCI);
}
}
private static int Fibonacci(int x)
{
var previousValue = -1;
var currentResult = 1;
for (var i = 0; i <= x; ++i)
{
var sum = currentResult + previousValue;
previousValue = currentResult;
currentResult = sum;
}
return currentResult;
}
}
Any ideas on what maybe going on?
As we've established in the comments, you can workaround this performance bash by pinning the process to a specific processor on the AMD Opteron machines.
Kindled by this not-really-on-topic question I decided to have a look at possible scenarios where single core pinning would make such a difference (from 11 to 7 seconds seems a bit extreme).
The most plausible answer is not that revolutionary:
The AMD Opteron series employ HyperTransport in a so-called NUMA architecture, instead of a traditional FSB as you would find on Intel's SMP CPU's (Xeon 4850 included)
My guess is that this symptom stems from the fact that individual nodes in a NUMA architecture has individual cache, as opposed to the Intel CPU, in which the processor cache is shared.
In other words, when consecutive computations shift between nodes on the Opteron, the cache is flushed, whereas balancing between processors in an SMP architecture like the Xeon 4850 has no such impact since the cache is shared.
Setting affinity in .NET is pretty easy, just pick a processor (let's just take the first one for simplicity):
static void Main(string[] args)
{
Console.WriteLine(Environment.ProcessorCount);
Console.Read();
//An AffinityMask of 0x0001 will make sure the process is always pinned to processer 0
Process thisProcess = Process.GetCurrentProcess();
thisProcess.ProcessorAffinity = (IntPtr)0x0001;
const int ITERATIONS = 10000;
const int FIBONACCI = 100000;
var watch = new Stopwatch();
watch.Start();
DoFibonnacci(ITERATIONS, FIBONACCI);
watch.Stop();
Console.WriteLine("Total fibonacci time: {0}ms", watch.ElapsedMilliseconds);
Console.ReadLine();
}
Although I'm pretty sure this is not very smart in a NUMA environment.
Windows 2008 R2 has some cool native NUMA functionality, and I found a promissing codeplex project with a .NET wrapper for this as well: http://multiproc.codeplex.com/
I'm in no way near qualified to teach you how to utilize this technology, but this should point you in the right direction.

Categories