How could I optimize this algorithm for approximating pi? - c#

I am quite inexperienced in coding, but I have managed to write this:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace PiApprox
{
public class PiApprox
{
//side of the rectangle
public static int s;
//radius of the circle
public static int r;
//indexers of the coordinates, points
public static int ix;
public static int iy;
//current y
public static decimal cury;
//without rounding
public static decimal wcury;
//amounts of points relative to the circle
public static decimal inAmount;
public static decimal onAmount;
public static decimal outAmount;
//amount of all points
public static decimal allAmount;
//short for inAmount and on onAmount,
//used to make the calculations clearer in the final part
public static decimal inanon;
//final result, crude approximation of pi
public static decimal piApprox;
public static void Main()
{
while (true)
{
Calculate();
}
}
public static void Calculate ()
{
s = Convert.ToInt32(Console.ReadLine());
//calculate the radius of the circle
r = s / 2;
//calculate the total amount of points in the grid
//rectangle area
allAmount = (decimal) Math.Pow(s, 2);
//reset values
inAmount = 0;
onAmount = 0;
outAmount = 0;
//main loop
//iterate for y, from up to down
for (ix = -r; ix <= 0; ix++)
{
wcury = (decimal) Math.Sqrt(Math.Pow(r, 2) - Math.Pow(ix, 2));
cury = Math.Floor(wcury);
outAmount += r - (int)cury;
if (wcury == cury)
{
onAmount++;
}
if (wcury == cury)
{
inAmount += (int)cury;
}
else
{
inAmount += (int)cury + 1;
}
Result();
}
Result();
}
public static void Result()
{
//total amount of points
inanon = 4 * (onAmount + inAmount - (r + 1)) + 1;
//proportion
piApprox = 4 * (inanon / allAmount);
Console.SetCursorPosition(1, 0);
Console.WriteLine(piApprox);
}
}
}
The Monte Carlo principle is simple; I calculate the y values
for the plot f(x) = sqrt(r^2 - ix^2) which represent the first quarter of a circle. I then calculate points within the circle and output it at the end.
The multiplication on the line piApprox = 4 * (inanon / allAmount);
comes from the proportions of the square and the circle:
(pi * r^2) / ( (2r) ^ 2 ) ->
(pi * r ^ 2) / (4 * r ^ 2) -> pi / 4
Is there something I could do to speed up the computing?

I assume you're new to C# so I'll just give you a couple hints here.
Several things have potential for improvement:
decimal is slow: it uses software computations. On the other hand, calculations on int, double and similar are implemented in hardware. Use int here, you don't use the decimal part anyway.
Math.Pow is slow. Don't use it for squaring: Replace Math.Pow(x, 2) with x * x
Math.Sqrt is slow. Instead of comparing Math.Sqrt(x) to y, compare x to y * y instead. Or just call it once at the end.
Math.Floor is slow :)
You could use parallelism to leverage multicore CPUs
You should use local variables as they're more prone for optimization
Bear in mind that when I mean slow it's relative. All of these operations are extremely fast in an absolute sense - I just mean you could use an even faster alternative.
But there's one thing which is painfully slow (such that it's noticeable for a human): Console. It got much better on Windows 10, but it's still slow and you're using the console in the hot path of your code. Get rid of these intermediate results.
One more thing, if you use int in a division, you'll get an int in C#. You need to cast one operand to, say, a double before dividing if you want to get the fractional part (as in (double)x / y).

Related

Something is wrong with the accuracy of calculation between variables

I have some problems with my code where I think the accuracy is a bit off. I'll take out the declarations of variables from my code, so the code is as small as possible:
int a = Int32.Parse(tb_weight.Text);
double b = 0;
b = (a * 1.03) / 1000;
double g = 0;
g = (1.09 + (0.41 * (Math.Sqrt(50 / b))));
lbl_vertforce.Content = Math.Round((b * g * 9.81), 2);
So, tb_weight is a textbox where the input is made, and lets say the input is 5000, the label lbl_vertforce is showing 119,61 and according to my calculator, it should show 119,74. What is wroing here?
Doubles are not 100% precise and can vary in the least common digits. If you want exact precision you need to use Decimal type which has a bigger memory foot print, but was designed to be very precise. Unfortunately Math.Sqrt is not overloaded for Decimal and only works on doubles. I have provide code I found in another posting discussing the subject of Decimal Square roots: Performing Math operations on decimal datatype in C#?
public void YourCodeModifiedForDecimal()
{
int a = Int32.Parse(tb_weight.Text);
decimal b = 0;
b = (a* 1.03m) / 1000m;
decimal g = 0;
g = (1.09m + (0.41m * (Sqrt(50m / b))));
lbl_vertforce.Content = Math.Round((b* g * 9.81m), 2);
}
public static decimal Sqrt(decimal x, decimal? guess = null)
{
var ourGuess = guess.GetValueOrDefault(x / 2m);
var result = x / ourGuess;
var average = (ourGuess + result) / 2m;
if (average == ourGuess) // This checks for the maximum precision possible with a decimal.
return average;
else
return Sqrt(x, average);
}
You need to round g to 2 decimal places to get 119.74 in the final calculation.
g = Math.Round(1.09 + (0.41 * (Math.Sqrt(50 / b))), 2);

Why is my implementation of the parking lot test for random number generators producing bad results?

I'm trying to write an implementation of the parking lot test for random number generators. Here are the sources that I'm getting my information about the test from: Intel math library documentation and Page 4 of this paper along with the phi function for probability density listed here.
I wrote an implementation of the test in C#. It uses a 100x100 grid whose values are initially set to null. I then use the random number generator to generate random integers for x and y. If that index of the grid and it's neighbors are empty, that index gets set to 1. Otherwise, nothing happens because there was a "crash".
I ran it using C# System.Random generator. I don't believe the results are correct because I always get very near 3079 points parked, which is about 500 short of the average I'm supposed to get. It's also yields a p-value of 2.21829146215425E-90.
My code is below. Does anyone have any experience with this or can anyone see something that I might be doing incorrectly in my implementation? Any help would be greatly appreciated.
private void RunParkingLotTest()
{
points = new int?[100,100];
int parked = 0;
for (int i = 0; i < 12000; i++)
{
int x = random.Next(100);
int y = random.Next(100);
if (IsSafeToPark(x, y))
{
points[x, y] = 1;
parked++;
}
}
Console.WriteLine("Parked: " + parked + "\nP value: " + PhiFunction((parked-3523)/21.9));
}
private bool IsSafeToPark(int x, int y)
{
return PointIsEmpty(x, y)
&& LeftOfPointIsEmpty(x, y)
&& RightOfPointIsEmpty(x, y)
&& BelowPointIsEmpty(x, y)
&& AbovePointIsEmpty(x, y);
}
private bool AbovePointIsEmpty(int x, int y)
{
if (y == 99)
{
return true;
}
else
return points[x, y + 1] == null;
}
private bool BelowPointIsEmpty(int x, int y)
{
if (y == 0)
{
return true;
}
else
return points[x, y - 1] == null;
}
private bool RightOfPointIsEmpty(int x, int y)
{
if (x == 99)
{
return true;
}
else
return points[x + 1, y] == null;
}
private bool LeftOfPointIsEmpty(int x, int y)
{
if (x == 0)
{
return true;
}
else
return points[x - 1, y] == null;
}
private bool PointIsEmpty(int x, int y)
{
return points[x, y] == null;
}
private double PhiFunction(double x)
{
//ϕ(x) = (2π)−½e−x2/2
return ((1 / Math.Sqrt(2 * Math.PI)) * Math.Exp(-(Math.Pow(x, 2)) / 2));
}
edit - The problems with my original implementation were
I was plotting squares instead of disks
I only plotted points at integer values. I should have used decimal values instead.
As a result of the above two, I needed to change my distance check
Thanks to Chris Sinclair and mine z for help in figuring this out. The final code is posted below.
I'm going to take a stab at this, and admittedly, I have not attempted any such test, so forgive me if I'm way off. In general though, the .NET Random implementation is pretty good and I've never had issues with it, so I wouldn't suspect that initially especially since you're properly reusing the same instance instead of creating new ones.
Reading from the parking.pdf, and from the Intel documentation, it seems that they're using discs, and compute the distance from their centre points. Your implementation is using squares (array of 1 distance between spots) and thus ignoring diagonals.
From the pdf:
If disks were being used, the distance between the particles r =
p(x(i) − z)2 + (y(i) − z)2 would need to be less than or equal to one.
Does it matter whether one uses disks or squares? An indication of the
importance of which geometric figure is parked can be obtained by
comparing the area occupied by a square of side 1.0 to the area of a
disk of diameter 1.0. The ratio of the areas, disk to square, is π/4.
Therefore, it would be anticipated that more disks could be placed in
a box than squares in the same number of tries.
And the Intel doc:
The test assumes a next random point (x, y) successfully ”parked”, if
it is far enough from every previous successfully ”parked” point. The
sufficient distance between the points (x1, y1) and (x2, y2) is
min(|x1 - x2|,|y1 - y2|) > 1.
I'm guessing that the π/4 disk to square ratio and the differences between how many discs can fit vs squares might be why you're seeing a different number. (although right now I fail to see a directly relationship between 3523 and 3070 and π/4. 3523 * π/4 = 2767, which is close, but I'm sure if there's a relationship it's slightly more complex than just simple multiplication.)
Not a great answer, but my best guess.
EDIT: Interestingly enough, I did a quick implementation using discs with 1 unit diameter and getting results around 4000 parked. So maybe there's a bit more to this than my untrained self can grasp (or maybe .NET's Random doesn't pass the test?) Anyway, here's my disc implementation:
List<Point> parkedCars = new List<Point>();
Random random = new Random();
void Main()
{
int parked = 0;
for (int i = 0; i < 12000; i++)
{
double x = random.NextDouble() * 100;
double y = random.NextDouble() * 100;
Point pointToPark = new Point(x, y);
if (IsSafeToPark(pointToPark))
{
parkedCars.Add(pointToPark);
parked++;
}
}
Console.WriteLine("Parked: " + parked);
}
private bool IsSafeToPark(Point pointToPark)
{
//make sure it's "inside" the box
if (pointToPark.X < 0.5 || pointToPark.X > 99.5
|| pointToPark.Y < 0.5 || pointToPark.Y > 99.5)
return false;
if (parkedCars.Any(p => Distance(pointToPark, p) <= 1))
return false;
return true;
}
private double Distance(Point p1, Point p2)
{
return Math.Sqrt((p1.X - p2.X) * (p1.X - p2.X) + (p1.Y - p2.Y) * (p1.Y - p2.Y));
}
Using my likely too simple application of the π/4 ratio yields about 3142. A bit closer, but it seems very incorrect.
EDIT: As #mike z pointed out, my test using directly distance is incorrect. According to the parameters of the test, which I forgot about, just checks that the X and Y distance are greater than 1. Changing my Distance check to:
Math.Max(Math.Abs(p1.X - p2.X), Math.Abs(p1.Y - p2.Y))
Yields a much closer result around 3450, which is pretty close. If I take out my "//make sure it's "inside" the box" check, averaged over 10 tries gets 3531!
So my final, "working" code is:
public struct Point
{
public double X,Y;
public Point(double x, double y)
{
this.X = x;
this.Y = y;
}
}
List<Point> parkedCars = new List<Point>();
Random random = new Random();
void Main()
{
int parked = 0;
for (int i = 0; i < 12000; i++)
{
double x = random.NextDouble() * 100;
double y = random.NextDouble() * 100;
Point pointToPark = new Point(x, y);
if (IsSafeToPark(pointToPark))
{
parkedCars.Add(pointToPark);
parked++;
}
}
Console.WriteLine("Parked: " + parked);
}
private bool IsSafeToPark(Point pointToPark)
{
if (parkedCars.Any(p => Distance(pointToPark, p) <= 1))
return false;
return true;
}
private double Distance(Point p1, Point p2)
{
return Math.Max(Math.Abs(p1.X - p2.X), Math.Abs(p1.Y - p2.Y));
}
EDIT: I ran the test 100 times twice, and averaged the results to 3521.29 and 3526.74 respectively. Not sure if this means there still something slightly more to this, but perhaps this is just indicative of rounding or floating point precision differences between .NET and Fortran.

Oscillate or "ping pong" between two values?

I have a path that is evaluate at time 't' and returns an orientation and position based on the path type.
The value for time is affected by the path type:
switch (type)
{
case PathType.Closed:
time = ToolBox.Wrap(time, StartTime, EndTime);
break; // Wrap time around the path time to loop
case PathType.Open:
time = ToolBox.Min(time, EndTime);
break; // Clamp the time value to the max path time range
case PathType.Oscillating:
break;
}
The missing link is oscillating.
My question is what is a good, efficient way for oscillating between two values?
For example (2, 7). If time reaches 7 it reverses and decrements towards to 2 and once it reaches 2 it reverses and increases towards 7.
The algorithm should know whether to increase/decrease the value based on the original value so if the value is 9 it knows the answer is 7 - (Abs(7 - 9). If the value is 14 the value has wrapped around so it will result in an increase of 1.
Higher values will also increase or decrease the value depending on the number of times it wraps around the original range.
I hope that makes sense as I'm finding it difficult to explain.
EDIT:
Doesn't oscillate with floating point values:
for (float i = 0; i < 100; i += 0.1f)
{
Console.WriteLine("{0} {1}", i, Oscillate(2.5f, 7.5f, i));
}
private float Oscillate(float min, float max, float value)
{
float range = max - min;
float multiple = value / range;
bool ascending = multiple % 2 == 0;
float modulus = value % range;
return ascending ? modulus + min : max - modulus;
}
Here is what I came up with:
public static int Oscillate(int input, int min, int max)
{
int range = max - min ;
return min + Math.Abs(((input + range) % (range * 2)) - range);
}
I'm assuming input will be a counter starting at 0.
Ideally, you should be abstracting this functionality into some kind of a class and not be concerned about how the implementation actually works when you're using it. Here's an initial take on what that would look like in C++ (my C# is a little rusty). I think you can work it into C# with only little difficulty.
class oscillator
{
private:
float min;
float max;
static float mod(float num, float div)
{
float ratio = num / div;
return div * (ratio - std::floor(ratio));
}
public:
oscillator(float a, float b)
: min(a < b ? a : b), max(a > b ? a : b) {}
float range() ( return max-min; }
float cycle_length() { return 2*range(); }
float normalize(float val)
{
float state = mod(val-min, cycle_length());
if (state > range())
state = cycle_length()-state;
return state + min;
}
};
This will oscillate your numbers between 2 & 7, in this example, time is an int:
bool isIncreasing = time <= 7;
for (int i = 0; i < 20; i++) //some random loop
{
time = time + (isIncreasing ? 1 : -1);
if (time >= 7 || time <= 2) isIncreasing = !isIncreasing;
}
New answer to account for float values:
// Note: Increase FACTOR depending on how many decimal places of accuracy you need.
private const float FACTOR = 10;
public void Test()
{
for (float i = 0; i < 1000; i += 0.1F)
{
Console.WriteLine("{0} {1}", i, Oscillate(2.5F, 7.5F, i));
}
}
private float Oscillate(float min, float max, float time)
{
return (float)(Oscillate_Aux(Upscale(min), Upscale(max), Upscale(time))) / FACTOR;
}
private int Upscale(float value)
{
return (int)(value * FACTOR);
}
private int Oscillate_Aux(int min, int max, int time)
{
int range = max - min;
int multiple = time / range;
bool ascending = multiple % 2 == 0;
int modulus = time % range;
return ascending ? modulus + min : max - modulus;
}
What you're describing sounds a lot like periodic linear interpolation between two values. Consider using XNA's MathHelper.Lerp function as the basis for your oscillation.
Note that it uses a percentage to control the oscillation as its third parameter. You'll have to figure out how to translate your time t value into a percentile, but you could start with ex. sin(t) to see how things work.
If you're reluctant to import XNA into your project, the core equation is very simple:
value1 + (value2 - value1) * amount
Edit: If your question, at its heart, really is "What is a good, efficient way for oscillating between two values?", then Math.Sin(t) (or Cos) can provide you with regular oscillation between 0 and 1.

How can I convert this divide and conquer code to compare one point to a list of points?

I found this code on the website http://rosettacode.org/wiki/Closest-pair_problem and I adopted the C# version of the divide and conquer method of finding the closest pair of points but what I am trying to do is adapt it for use to only find the closest point to one specific point. I have googled quite a bit and searched this website to find examples but none quite like this. I am not entirely sure what to change so that it only checks the list against one point rather than checking the list to find the two closest. I'd like to make my program operate as fast as possible because it could be searching a list of several thousand Points to find the closest to my current coordinate Point.
public class Segment
{
public Segment(PointF p1, PointF p2)
{
P1 = p1;
P2 = p2;
}
public readonly PointF P1;
public readonly PointF P2;
public float Length()
{
return (float)Math.Sqrt(LengthSquared());
}
public float LengthSquared()
{
return (P1.X - P2.X) * (P1.X - P2.X)
+ (P1.Y - P2.Y) * (P1.Y - P2.Y);
}
}
public static Segment Closest_BruteForce(List<PointF> points)
{
int n = points.Count;
var result = Enumerable.Range(0, n - 1)
.SelectMany(i => Enumerable.Range(i + 1, n - (i + 1))
.Select(j => new Segment(points[i], points[j])))
.OrderBy(seg => seg.LengthSquared())
.First();
return result;
}
public static Segment MyClosestDivide(List<PointF> points)
{
return MyClosestRec(points.OrderBy(p => p.X).ToList());
}
private static Segment MyClosestRec(List<PointF> pointsByX)
{
int count = pointsByX.Count;
if (count <= 4)
return Closest_BruteForce(pointsByX);
// left and right lists sorted by X, as order retained from full list
var leftByX = pointsByX.Take(count / 2).ToList();
var leftResult = MyClosestRec(leftByX);
var rightByX = pointsByX.Skip(count / 2).ToList();
var rightResult = MyClosestRec(rightByX);
var result = rightResult.Length() < leftResult.Length() ? rightResult : leftResult;
// There may be a shorter distance that crosses the divider
// Thus, extract all the points within result.Length either side
var midX = leftByX.Last().X;
var bandWidth = result.Length();
var inBandByX = pointsByX.Where(p => Math.Abs(midX - p.X) <= bandWidth);
// Sort by Y, so we can efficiently check for closer pairs
var inBandByY = inBandByX.OrderBy(p => p.Y).ToArray();
int iLast = inBandByY.Length - 1;
for (int i = 0; i < iLast; i++)
{
var pLower = inBandByY[i];
for (int j = i + 1; j <= iLast; j++)
{
var pUpper = inBandByY[j];
// Comparing each point to successivly increasing Y values
// Thus, can terminate as soon as deltaY is greater than best result
if ((pUpper.Y - pLower.Y) >= result.Length())
break;
Segment segment = new Segment(pLower, pUpper);
if (segment.Length() < result.Length())
result = segment;// new Segment(pLower, pUpper);
}
}
return result;
}
I used this code in my program to see the actual difference in speed and divide and conquer easily wins.
var randomizer = new Random(10);
var points = Enumerable.Range(0, 10000).Select(i => new PointF((float)randomizer.NextDouble(), (float)randomizer.NextDouble())).ToList();
Stopwatch sw = Stopwatch.StartNew();
var r1 = Closest_BruteForce(points);
sw.Stop();
//Debugger.Log(1, "", string.Format("Time used (Brute force) (float): {0} ms", sw.Elapsed.TotalMilliseconds));
richTextBox.AppendText(string.Format("Time used (Brute force) (float): {0} ms", sw.Elapsed.TotalMilliseconds));
Stopwatch sw2 = Stopwatch.StartNew();
var result2 = MyClosestDivide(points);
sw2.Stop();
//Debugger.Log(1, "", string.Format("Time used (Divide & Conquer): {0} ms", sw2.Elapsed.TotalMilliseconds));
richTextBox.AppendText(string.Format("Time used (Divide & Conquer): {0} ms", sw2.Elapsed.TotalMilliseconds));
//Assert.Equal(r1.Length(), result2.Length());
You can store the points in a better data structure that takes advantage of their position. Something like a quadtree.
The divide and conquer algorithm that you are trying to use doesn't really apply to this problem.
Don't use this algorithm at all, just go through the list one at a time comparing the distance to your reference point and at the end return the point that was the closest. This will be O(n).
You can probably add some extra speed ups but this should be good enough.
I can write some example code if you want.
You're mixing up two different problems. The only reason divide and conquer for the closest pair problem is faster than brute force is that it avoids comparing every point to every other point, so that it gets O(n log n) instead of O(n * n). But finding the closest point to just one point is just O(n). How can you find the closest point in a list of n points, while examining less than n points? What you're trying to do doesn't even make sense.
I can't say why your divide and conquer runs in less time than your brute force; maybe the linq implementation runs slower. But I think you'll find two things: 1) Even if, in absolute terms, your implementation of divide and conquer for 1 point runs in less time than your implementation of brute force for 1 point, they still have the same O(n). 2) If you just try a simple foreach loop and record the lowest distance squared, you'll get even better absolute time than your divide and conquer - and, it will still be O(n).
public static float LengthSquared(PointF P1, PointF P2)
{
return (P1.X - P2.X) * (P1.X - P2.X)
+ (P1.Y - P2.Y) * (P1.Y - P2.Y);
}
If, as your question states, you want to compare 1 (known) point to a list of points to find the closest then use this code.
public static Segment Closest_BruteForce(PointF P1, List<PointF> points)
{
PointF closest = null;
float minDist = float.MaxValue;
foreach(PointF P2 in points)
{
if(P1 != P2)
{
float temp = LengthSquared(P1, P2);
if(temp < minDist)
{
minDist = temp;
closest = P2;
}
}
}
return new Segment(P1, closest);
}
However, if as your example shows, you want to find the closest 2 points from a list of points try the below.
public static Segment Closest_BruteForce(List<PointF> points)
{
PointF closest1;
PointF closest2;
float minDist = float.MaxValue;
for(int x=0; x<points.Count; x++)
{
PointF P1 = points[x];
for(int y = x + 1; y<points.Count; y++)
{
PointF P2 = points[y];
float temp = LengthSquared(P1, P2);
if(temp < minDist)
{
minDist = temp;
closest1 = P1;
closest2 = P2;
}
}
}
return new Segment(closest1, closest2);
}
note the code above was written in the browser and may have some syntax errors.
EDIT Odd... is this an acceptable answer or not? Down-votes without explanation, oh well.

How do I determine the standard deviation (stddev) of a set of values?

I need to know if a number compared to a set of numbers is outside of 1 stddev from the mean, etc..
While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance...
Plus, don't never, ever, ever, compute a^2 as pow(a,2), a * a is almost certainly faster.
By far the best way of computing a standard deviation is Welford's method. My C is very rusty, but it could look something like:
public static double StandardDeviation(List<double> valueList)
{
double M = 0.0;
double S = 0.0;
int k = 1;
foreach (double value in valueList)
{
double tmpM = M;
M += (value - tmpM) / k;
S += (value - tmpM) * (value - M);
k++;
}
return Math.Sqrt(S / (k-2));
}
If you have the whole population (as opposed to a sample population), then use return Math.Sqrt(S / (k-1));.
EDIT: I've updated the code according to Jason's remarks...
EDIT: I've also updated the code according to Alex's remarks...
10 times faster solution than Jaime's, but be aware that,
as Jaime pointed out:
"While the sum of squares algorithm works fine most of the time, it
can cause big trouble if you are dealing with very large numbers. You
basically may end up with a negative variance"
If you think you are dealing with very large numbers or a very large quantity of numbers, you should calculate using both methods, if the results are equal, you know for sure that you can use "my" method for your case.
public static double StandardDeviation(double[] data)
{
double stdDev = 0;
double sumAll = 0;
double sumAllQ = 0;
//Sum of x and sum of x²
for (int i = 0; i < data.Length; i++)
{
double x = data[i];
sumAll += x;
sumAllQ += x * x;
}
//Mean (not used here)
//double mean = 0;
//mean = sumAll / (double)data.Length;
//Standard deviation
stdDev = System.Math.Sqrt(
(sumAllQ -
(sumAll * sumAll) / data.Length) *
(1.0d / (data.Length - 1))
);
return stdDev;
}
The accepted answer by Jaime is great, except you need to divide by k-2 in the last line (you need to divide by "number_of_elements-1").
Better yet, start k at 0:
public static double StandardDeviation(List<double> valueList)
{
double M = 0.0;
double S = 0.0;
int k = 0;
foreach (double value in valueList)
{
k++;
double tmpM = M;
M += (value - tmpM) / k;
S += (value - tmpM) * (value - M);
}
return Math.Sqrt(S / (k-1));
}
The Math.NET library provides this for you to of the box.
PM> Install-Package MathNet.Numerics
var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation();
var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();
See PopulationStandardDeviation for more information.
Code snippet:
public static double StandardDeviation(List<double> valueList)
{
if (valueList.Count < 2) return 0.0;
double sumOfSquares = 0.0;
double average = valueList.Average(); //.NET 3.0
foreach (double value in valueList)
{
sumOfSquares += Math.Pow((value - average), 2);
}
return Math.Sqrt(sumOfSquares / (valueList.Count - 1));
}
You can avoid making two passes over the data by accumulating the mean and mean-square
cnt = 0
mean = 0
meansqr = 0
loop over array
cnt++
mean += value
meansqr += value*value
mean /= cnt
meansqr /= cnt
and forming
sigma = sqrt(meansqr - mean^2)
A factor of cnt/(cnt-1) is often appropriate as well.
BTW-- The first pass over the data in Demi and McWafflestix answers are hidden in the calls to Average. That kind of thing is certainly trivial on a small list, but if the list exceed the size of the cache, or even the working set, this gets to be a bid deal.
I found that Rob's helpful answer didn't quite match what I was seeing using excel. To match excel, I passed the Average for valueList in to the StandardDeviation calculation.
Here is my two cents... and clearly you could calculate the moving average (ma) from valueList inside the function - but I happen to have already before needing the standardDeviation.
public double StandardDeviation(List<double> valueList, double ma)
{
double xMinusMovAvg = 0.0;
double Sigma = 0.0;
int k = valueList.Count;
foreach (double value in valueList){
xMinusMovAvg = value - ma;
Sigma = Sigma + (xMinusMovAvg * xMinusMovAvg);
}
return Math.Sqrt(Sigma / (k - 1));
}
With Extension methods.
using System;
using System.Collections.Generic;
namespace SampleApp
{
internal class Program
{
private static void Main()
{
List<double> data = new List<double> {1, 2, 3, 4, 5, 6};
double mean = data.Mean();
double variance = data.Variance();
double sd = data.StandardDeviation();
Console.WriteLine("Mean: {0}, Variance: {1}, SD: {2}", mean, variance, sd);
Console.WriteLine("Press any key to continue...");
Console.ReadKey();
}
}
public static class MyListExtensions
{
public static double Mean(this List<double> values)
{
return values.Count == 0 ? 0 : values.Mean(0, values.Count);
}
public static double Mean(this List<double> values, int start, int end)
{
double s = 0;
for (int i = start; i < end; i++)
{
s += values[i];
}
return s / (end - start);
}
public static double Variance(this List<double> values)
{
return values.Variance(values.Mean(), 0, values.Count);
}
public static double Variance(this List<double> values, double mean)
{
return values.Variance(mean, 0, values.Count);
}
public static double Variance(this List<double> values, double mean, int start, int end)
{
double variance = 0;
for (int i = start; i < end; i++)
{
variance += Math.Pow((values[i] - mean), 2);
}
int n = end - start;
if (start > 0) n -= 1;
return variance / (n);
}
public static double StandardDeviation(this List<double> values)
{
return values.Count == 0 ? 0 : values.StandardDeviation(0, values.Count);
}
public static double StandardDeviation(this List<double> values, int start, int end)
{
double mean = values.Mean(start, end);
double variance = values.Variance(mean, start, end);
return Math.Sqrt(variance);
}
}
}
/// <summary>
/// Calculates standard deviation, same as MATLAB std(X,0) function
/// <seealso cref="http://www.mathworks.co.uk/help/techdoc/ref/std.html"/>
/// </summary>
/// <param name="values">enumumerable data</param>
/// <returns>Standard deviation</returns>
public static double GetStandardDeviation(this IEnumerable<double> values)
{
//validation
if (values == null)
throw new ArgumentNullException();
int lenght = values.Count();
//saves from devision by 0
if (lenght == 0 || lenght == 1)
return 0;
double sum = 0.0, sum2 = 0.0;
for (int i = 0; i < lenght; i++)
{
double item = values.ElementAt(i);
sum += item;
sum2 += item * item;
}
return Math.Sqrt((sum2 - sum * sum / lenght) / (lenght - 1));
}
The trouble with all the other answers is that they assume you have your
data in a big array. If your data is coming in on the fly, this would be
a better approach. This class works regardless of how or if you store your data. It also gives you the choice of the Waldorf method or the sum-of-squares method. Both methods work using a single pass.
public final class StatMeasure {
private StatMeasure() {}
public interface Stats1D {
/** Add a value to the population */
void addValue(double value);
/** Get the mean of all the added values */
double getMean();
/** Get the standard deviation from a sample of the population. */
double getStDevSample();
/** Gets the standard deviation for the entire population. */
double getStDevPopulation();
}
private static class WaldorfPopulation implements Stats1D {
private double mean = 0.0;
private double sSum = 0.0;
private int count = 0;
#Override
public void addValue(double value) {
double tmpMean = mean;
double delta = value - tmpMean;
mean += delta / ++count;
sSum += delta * (value - mean);
}
#Override
public double getMean() { return mean; }
#Override
public double getStDevSample() { return Math.sqrt(sSum / (count - 1)); }
#Override
public double getStDevPopulation() { return Math.sqrt(sSum / (count)); }
}
private static class StandardPopulation implements Stats1D {
private double sum = 0.0;
private double sumOfSquares = 0.0;
private int count = 0;
#Override
public void addValue(double value) {
sum += value;
sumOfSquares += value * value;
count++;
}
#Override
public double getMean() { return sum / count; }
#Override
public double getStDevSample() {
return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / (count - 1));
}
#Override
public double getStDevPopulation() {
return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / count);
}
}
/**
* Returns a way to measure a population of data using Waldorf's method.
* This method is better if your population or values are so large that
* the sum of x-squared may overflow. It's also probably faster if you
* need to recalculate the mean and standard deviation continuously,
* for example, if you are continually updating a graphic of the data as
* it flows in.
*
* #return A Stats1D object that uses Waldorf's method.
*/
public static Stats1D getWaldorfStats() { return new WaldorfPopulation(); }
/**
* Return a way to measure the population of data using the sum-of-squares
* method. This is probably faster than Waldorf's method, but runs the
* risk of data overflow.
*
* #return A Stats1D object that uses the sum-of-squares method
*/
public static Stats1D getSumOfSquaresStats() { return new StandardPopulation(); }
}
We may be able to use statistics module in Python. It has stedev() and pstdev() commands to calculate standard deviation of sample and population respectively.
details here: https://www.geeksforgeeks.org/python-statistics-stdev/
import statistics as st
print(st.ptdev(dataframe['column name']))
This is Population standard deviation
private double calculateStdDev(List<double> values)
{
double average = values.Average();
return Math.Sqrt((values.Select(val => (val - average) * (val - average)).Sum()) / values.Count);
}
For Sample standard deviation, just change [values.Count] to [values.Count -1] in above code.
Make sure you don't have only 1 data point in your set.

Categories