Mean and Standard Deviation wrong c# - c#

I've been doing an app since few days ago but it's wrong and I do not know why.
I've done the same operation in various ways. I've searched here on the blog, but I still get the incorrect result.
I hope you can help me:
I'm calculating the ** Mean and Standard Deviation**. The Mean is OK. The Standard Deviation is wrong. This is my code:
LinkedList<Double> lista = new LinkedList<Double>();
int contador = 0;
private void btnAgregar_Click(object sender, EventArgs e)
{
lista.AddLast(Convert.ToDouble(txtNum.Text));
MessageBox.Show("Se agregó el número: " + txtNum.Text);
contador++;
txtNum.Text = "";
txtNum.Focus();
}
Double media;
Double desviacionE = 0;
Double suma = 0;
private void btnCalcular_Click(object sender, EventArgs e)
{
media = 0;
calculaMedia();
calculaDesviacionE();
}
public void calculaMedia()
{
foreach (var item in lista)
{
String valorItem = item.ToString();
suma = suma + Convert.ToDouble(valorItem);
}
media = suma / contador;
txtMedia.Text = "" + media;
}
public void calculaDesviacionE()
{
Double average = lista.Average();
Double sum = 0;
foreach (var item in lista)
{
sum += ((Convert.ToDouble(item.ToString()))*(Convert.ToDouble(item.ToString())));
}
Double sumProm = sum / lista.Count();
Double desvE = Math.Sqrt(sumProm-(average*average));
txtDesv.Text = "" + desvE;
}
I hope You can help me!
Thank You

Following the rules for standard deviation found at http://en.wikipedia.org/wiki/Standard_deviation
LinkedList<Double> list = new LinkedList<Double>();
double sumOfSquares = 0;
double deviation;
double delta;
list.AddLast(2);
list.AddLast(4);
list.AddLast(4);
list.AddLast(4);
list.AddLast(5);
list.AddLast(5);
list.AddLast(7);
list.AddLast(9);
double average = list.Average();
Console.WriteLine("Average: " + average);
foreach (double item in list)
{
delta = Math.Abs(item - average);
sumOfSquares += (delta * delta);
}
Console.WriteLine("Sum of Squares: " + sumOfSquares);
deviation = Math.Sqrt(sumOfSquares / list.Count());
Console.WriteLine("Standard Deviation: " + deviation); //Final result is 2.0

You need to subtract the average before squaring.
// Population Standard Deviation
public void populationStandardDev()
{
Double average = lista.Average();
Double sum = 0;
foreach (var item in lista)
{
Double difference = item - average;
sum += difference*difference;
}
Double sumProd = sum / lista.Count(); // divide by n
Double desvE = Math.Sqrt(sumProd);
}
// Standard deviation
public void standardDev()
{
Double average = lista.Average();
Double sum = 0;
foreach (var item in lista)
{
Double difference = item - average;
sum += difference*difference;
}
Double sumProd = sum / (lista.Count()-1); // divide by n-1
Double desvE = Math.Sqrt(sumProd);
}

The formula depends on the set of data you have.
Is it the entire population? Then you should use the Population Standard Deviation (divisor: n).
Is the data a sample of a set? Then you should use the Sample Standard Deviation (divisor: n - 1)
You may find an easier-to-understand guide here: Laerd Statistics - Standard Deviation, which also has a handy Calculator for both solutions.
So, it is as #Greg answered, though I would first check if the list holds any values to avoid division by zero.
double stdDeviation = 0;
if (lista.Any())
{
var avg = lista.Average();
var sumOfSquares = lista.Sum(item => Math.Pow(item - avg, 2));
stdDeviation = Math.Sqrt(sumOfSquares / [divisor goes here]);
}
return stdDeviation;
Where the divisor be lista.Count() for population or (lista.Count() - 1) for samples.

Related

Trying to calculate weighted moving average but getting zero as result always

I am trying to get out the "Weighted moving average" of a array of double values.
I have tried to get all peaces from some internet examples together but i always getting zero as result.
Problem is the calulation of "weight", its being zero but it should not be zero, example 1 / 107 = 0,0093457943925234 but the weight double values getting zero, i have tried change to long and decimal and getting the same problem.
Any ideas?
public static double WeighteedMovingAverage(double[] data)
{
double aggregate = 0;
double weight;
int item = 1;
int count = data.Count();
foreach (var d in data)
{
weight = item / count;
aggregate += d * weight;
count++;
}
return (double)(aggregate / count);
}
weight = (double)item / (double)count;
need to be double to avoid casting before operation
public static double WeighteedMovingAverage(double[] data)
{
double aggregate = 0;
double weight;
int item = 1;
int count = data.Count();
foreach (var d in data)
{
//replace with line below weight = item / count;
weight = (double)item / (double)count;
aggregate += d * weight;
count++;
}
//replace with line below return (double)(aggregate / count);
return (double)(aggregate / (double)count);
}

How to efficiently calculate a moving Standard Deviation

Below you can see my C# method to calculate Bollinger Bands for each point (moving average, up band, down band).
As you can see this method uses 2 for loops to calculate the moving standard deviation using the moving average. It used to contain an additional loop to calculate the moving average over the last n periods. This one I could remove by adding the new point value to total_average at the beginning of the loop and removing the i - n point value at the end of the loop.
My question now is basically: Can I remove the remaining inner loop in a similar way I managed with the moving average?
public static void AddBollingerBands(SortedList<DateTime, Dictionary<string, double>> data, int period, int factor)
{
double total_average = 0;
for (int i = 0; i < data.Count(); i++)
{
total_average += data.Values[i]["close"];
if (i >= period - 1)
{
double total_bollinger = 0;
double average = total_average / period;
for (int x = i; x > (i - period); x--)
{
total_bollinger += Math.Pow(data.Values[x]["close"] - average, 2);
}
double stdev = Math.Sqrt(total_bollinger / period);
data.Values[i]["bollinger_average"] = average;
data.Values[i]["bollinger_top"] = average + factor * stdev;
data.Values[i]["bollinger_bottom"] = average - factor * stdev;
total_average -= data.Values[i - period + 1]["close"];
}
}
}
The problem with approaches that calculate the sum of squares is that it and the square of sums can get quite large, and the calculation of their difference may introduce a very large error, so let's think of something better. For why this is needed, see the Wikipedia article on Algorithms for computing variance and John Cook on Theoretical explanation for numerical results)
First, instead of calculating the stddev let's focus on the variance. Once we have the variance, stddev is just the square root of the variance.
Suppose the data are in an array called x; rolling an n-sized window by one can be thought of as removing the value of x[0] and adding the value of x[n]. Let's denote the averages of x[0]..x[n-1] and x[1]..x[n] by µ and µ’ respectively. The difference between the variances of x[0]..x[n-1] and x[1]..x[n] is, after canceling out some terms and applying (a²-b²) = (a+b)(a-b):
Var[x[1],..,x[n]] - Var[x[0],..,x[n-1]]
= (\sum_1^n x[i]² - n µ’²)/(n-1) - (\sum_0^{n-1} x[i]² - n µ²)/(n-1)
= (x[n]² - x[0]² - n(µ’² - µ²))/(n-1)
= (x[n]-µ’ + x[0]-µ)(x[n]-x[0])/(n-1)
Therefore the variance is perturbed by something that doesn't require you to maintain the sum of squares, which is better for numerical accuracy.
You can calculate the mean and variance once in the beginning with a proper algorithm (Welford's method). After that, every time you have to replace a value in the window x[0] by another x[n] you update the average and variance like this:
new_Avg = Avg + (x[n]-x[0])/n
new_Var = Var + (x[n]-new_Avg + x[0]-Avg)(x[n] - x[0])/(n-1)
new_StdDev = sqrt(new_Var)
The answer is yes, you can. In the mid-80's I developed just such an algorithm (probably not original) in FORTRAN for a process monitoring and control application. Unfortunately, that was over 25 years ago and I do not remember the exact formulas, but the technique was an extension of the one for moving averages, with second order calculations instead of just linear ones.
After looking at your code some, I am think that I can suss out how I did it back then. Notice how your inner loop is making a Sum of Squares?:
for (int x = i; x > (i - period); x--)
{
total_bollinger += Math.Pow(data.Values[x]["close"] - average, 2);
}
in much the same way that your average must have originally had a Sum of Values? The only two differences are the order (its power 2 instead of 1) and that you are subtracting the average each value before you square it. Now that might look inseparable, but in fact they can be separated:
SUM(i=1; n){ (v[i] - k)^2 }
is
SUM(i=1..n){v[i]^2 -2*v[i]*k + k^2}
which becomes
SUM(i=1..n){v[i]^2 -2*v[i]*k} + k^2*n
which is
SUM(i=1..n){v[i]^2} + SUM(i=1..n){-2*v[i]*k} + k^2*n
which is also
SUM(i=1..n){v[i]^2} + SUM(i=1..n){-2*v[i]}*k + k^2*n
Now the first term is just a Sum of Squares, you handle that in the same way that you do the sum of Values for the average. The last term (k^2*n) is just the average squared times the period. Since you divide the result by the period anyway, you can just add the new average squared without the extra loop.
Finally, in the second term (SUM(-2*v[i]) * k), since SUM(v[i]) = total = k*n you can then change it into this:
-2 * k * k * n
or just -2*k^2*n, which is -2 times the average squared, once the period (n) is divided out again. So the final combined formula is:
SUM(i=1..n){v[i]^2} - n*k^2
or
SUM(i=1..n){values[i]^2} - period*(average^2)
(be sure to check the validity of this, since I am deriving it off the top of my head)
And incorporating into your code should look something like this:
public static void AddBollingerBands(ref SortedList<DateTime, Dictionary<string, double>> data, int period, int factor)
{
double total_average = 0;
double total_squares = 0;
for (int i = 0; i < data.Count(); i++)
{
total_average += data.Values[i]["close"];
total_squares += Math.Pow(data.Values[i]["close"], 2);
if (i >= period - 1)
{
double total_bollinger = 0;
double average = total_average / period;
double stdev = Math.Sqrt((total_squares - Math.Pow(total_average,2)/period) / period);
data.Values[i]["bollinger_average"] = average;
data.Values[i]["bollinger_top"] = average + factor * stdev;
data.Values[i]["bollinger_bottom"] = average - factor * stdev;
total_average -= data.Values[i - period + 1]["close"];
total_squares -= Math.Pow(data.Values[i - period + 1]["close"], 2);
}
}
}
I've used commons-math (and contributed to that library!) for something very similar to this. It's open-source, porting to C# should be easy as store-bought pie (have you tried making a pie from scratch!?). Check it out: http://commons.apache.org/math/api-3.1.1/index.html. They have a StandardDeviation class. Go to town!
Most important information has already been given above --- but maybe this is still of general interest.
A tiny Java library to calculate moving average and standard deviation is available here:
https://github.com/tools4j/meanvar
The implementation is based on a variant of Welford's method mentioned above. Methods to remove and replace values have been derived that can be used for moving value windows.
Disclaimer: I am the author of the said library.
I just did it with Data From Binance Future API
Hope this helps:
using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using static System.Windows.Forms.VisualStyles.VisualStyleElement;
namespace Trading_Bot_1
{
public class BOLL
{
private BollingerBandData graphdata = new BollingerBandData();
private List<TickerData> data = new List<TickerData>();
public BOLL(string url)
{
string js = getJsonFromUrl(url);
//dynamic data = JObject.Parse(js);
object[][] arrays = JsonConvert.DeserializeObject<object[][]>(js);
data = new List<TickerData>();
for (int i = 1; i < 400; i++)
{
data.Add(new TickerData
{
Date = DateTime.Now,
Open = Convert.ToDouble(arrays[arrays.Length - i][1]),
High = Convert.ToDouble(arrays[arrays.Length - i][2]),
Low = Convert.ToDouble(arrays[arrays.Length - i][3]),
Close = Convert.ToDouble(arrays[arrays.Length - i][4]),
Volume = Math.Round(Convert.ToDouble(arrays[arrays.Length - i][4]), 0),
AdjClose = Convert.ToDouble(arrays[arrays.Length - i][6])
});
}
graphdata.LowerBand.Add(1);
graphdata.LowerBand.Add(2);
graphdata.LowerBand.Add(3);
graphdata.LowerBand.Add(1);
graphdata.UpperBand.Add(1);
graphdata.UpperBand.Add(2);
graphdata.UpperBand.Add(3);
graphdata.UpperBand.Add(4);
graphdata.MovingAverageWindow.Add(10);
graphdata.MovingAverageWindow.Add(20);
graphdata.MovingAverageWindow.Add(40);
graphdata.MovingAverageWindow.Add(50);
graphdata.Length.Add(10);
graphdata.Length.Add(30);
graphdata.Length.Add(50);
graphdata.Length.Add(100);
// DataContext = graphdata;
}
public static string getJsonFromUrl(string url1)
{
var uri = String.Format(url1);
WebClient client = new WebClient();
client.UseDefaultCredentials = true;
var data = client.DownloadString(uri);
return data;
}
List<double> UpperBands = new List<double>();
List<double> LowerBands = new List<double>();
public List<List<double>> GetBOLLDATA(int decPlaces)
{
int datalength = graphdata.SelectedMovingAverage + graphdata.SelectedLength;
string bands = "";
for (int i = graphdata.SelectedLength - 1; i >= 0; i--)
{
List<double> price = new List<double>();
for (int j = 0; j < graphdata.SelectedMovingAverage; j++)
{
price.Add(data[i + j].Close);
}
double sma = CalculateAverage(price.ToArray());
double sigma = CalculateSTDV(price.ToArray());
double lower = sma - (graphdata.SelectedLowerBand * sigma);
double upper = sma + (graphdata.SelectedUpperBand * sigma);
UpperBands.Add(Math.Round( upper,decPlaces));
LowerBands.Add(Math.Round(lower, decPlaces));
bands += (Math.Round(upper, decPlaces) + " / " + Math.Round(lower, decPlaces)) + Environment.NewLine;
// graphdata.ChartData.Add(new ChartData() { SMA = sma, LowerBandData = lower, UpperBandData = upper });
}
//MessageBox.Show(bands);
return new List<List<double>> { UpperBands, LowerBands };
}
public double[] GetBOLLDATA(int decPlaces, string a)
{
List<double> price = new List<double>();
for (int j = 0; j < graphdata.SelectedMovingAverage; j++)
{
price.Add(data[j].Close);
}
double sma = CalculateAverage(price.ToArray());
double sigma = CalculateSTDV(price.ToArray());
double lower = sma - (graphdata.SelectedLowerBand * sigma);
double upper = sma + (graphdata.SelectedUpperBand * sigma);
return new double[] { Math.Round(upper, decPlaces), Math.Round(lower, decPlaces) };
}
private double CalculateAverage(double[] data)
{
int count = data.Length;
double sum = 0;
for (int i = 0; i < count; i++)
{
sum += data[i];
}
return sum / count;
}
private double CalculateVariance(double[] data)
{
int count = data.Length;
double sum = 0;
double avg = CalculateAverage(data);
for (int i = 0; i < count; i++)
{
sum += (data[i] - avg) * (data[i] - avg);
}
return sum / (count - 1);
}
private double CalculateSTDV(double[] data)
{
double var = CalculateVariance(data);
return Math.Sqrt(var);
}
}
public class ChartData
{
public double UpperBandData
{ get; set; }
public double LowerBandData
{ get; set; }
public double SMA
{ get; set; }
}
public class BollingerBandData : INotifyPropertyChanged
{
private ObservableCollection<int> _lowerBand;
private ObservableCollection<int> _upperBand;
private ObservableCollection<int> _movingAvg;
private ObservableCollection<int> _length;
private ObservableCollection<ChartData> _chartData;
private int _selectedLowerBand;
private int _selectedUpperBand;
private int _selectedMovingAvg;
private int _selectedLength;
public BollingerBandData()
{
_lowerBand = new ObservableCollection<int>();
_upperBand = new ObservableCollection<int>();
_movingAvg = new ObservableCollection<int>();
_length = new ObservableCollection<int>();
_chartData = new ObservableCollection<ChartData>();
SelectedLowerBand = 2;
SelectedUpperBand = 2;
SelectedMovingAverage = 20;
SelectedLength = 5;
}
public ObservableCollection<ChartData> ChartData
{
get
{
return _chartData;
}
set
{
_chartData = value;
RaisePropertyChanged("ChartData");
}
}
public ObservableCollection<int> LowerBand
{
get
{
return _lowerBand;
}
set
{
_lowerBand = value;
RaisePropertyChanged("LowerBand");
}
}
public ObservableCollection<int> UpperBand
{
get
{
return _upperBand;
}
set
{
_upperBand = value;
RaisePropertyChanged("UpperBand");
}
}
public ObservableCollection<int> MovingAverageWindow
{
get
{
return _movingAvg;
}
set
{
_movingAvg = value;
RaisePropertyChanged("MovingAverageWindow");
}
}
public ObservableCollection<int> Length
{
get
{
return _length;
}
set
{
_length = value;
RaisePropertyChanged("Length");
}
}
public int SelectedLowerBand
{
get
{
return _selectedLowerBand;
}
set
{
_selectedLowerBand = value;
RaisePropertyChanged("SelectedLowerBand");
}
}
public int SelectedUpperBand
{
get
{
return _selectedUpperBand;
}
set
{
_selectedUpperBand = value;
RaisePropertyChanged("SelectedUpperBand");
}
}
public int SelectedMovingAverage
{
get
{
return _selectedMovingAvg;
}
set
{
_selectedMovingAvg = value;
RaisePropertyChanged("SelectedMovingAverage");
}
}
public int SelectedLength
{
get
{
return _selectedLength;
}
set
{
_selectedLength = value;
RaisePropertyChanged("SelectedLength");
}
}
public event PropertyChangedEventHandler PropertyChanged;
private void RaisePropertyChanged(string propertyName)
{
PropertyChangedEventHandler handler = this.PropertyChanged;
if (handler != null)
{
handler(this, new PropertyChangedEventArgs(propertyName));
}
}
}
public class TickerData
{
public DateTime Date
{ get; set; }
public double Open
{ get; set; }
public double High
{ get; set; }
public double Low
{ get; set; }
public double Close
{ get; set; }
public double Volume
{ get; set; }
public double AdjClose
{ get; set; }
}
}

Infinity in interest calculation?

Thanks to Laurence Burke in my other question for the isMaybeMoney function, I am able to determine whether an input is money or not.
What I'm doing now is trying to calculate the total after interest but I keep getting Infinity written to the screen. What in the world is wrong with my interestsaccrued function? It's supposed to be $3,522.55 when I use $1,234 as the starting balance with 3.5% interest.
Can someone please help me out?
static float money;
static void Main()
{
string[] myMaybeBalances = Accounts.GetStartingBalances();
myIsMaybeMoneyValidator Miimv = new myIsMaybeMoneyValidator();
ArrayList interests = Miimv.interestsAccrued(myMaybeBalances);
foreach (object interest in interests)
{
Console.WriteLine(interest);
}
Console.ReadLine();
}
public ArrayList interestsAccrued(string[] myMaybeBalances)
{
ArrayList interests = new ArrayList();
foreach (string myMaybeBalance in myMaybeBalances)
{
bool myResult = isMaybeMoney(myMaybeBalance);
if (myResult == true)
{
decimal[] rates = Accounts.GetRates();
for (int i = 0; i < rates.Length; i++)
{
decimal rate = rates[i];
float total = 1;
int n_X_t = 360;
while (n_X_t != 0)
{
rate = (1 + rates[i] / 12);
float myRate;
float.TryParse(rate.ToString(), out myRate);
total = total * myRate;
total = total * money;
n_X_t = n_X_t - 1;
}
interests.Add(total);
}
}
}
return interests;
}
public bool isMaybeMoney(object theirMaybeMoney)
{
string myMaybeMoney = theirMaybeMoney.ToString();
float num;
bool isValid = float.TryParse(myMaybeMoney,
NumberStyles.Currency,
CultureInfo.GetCultureInfo("en-US"), // cached
out num);
money = num;
return isValid;
}
You are multiplying total by the rate each step through the while loop, which seems reasonable enough, but you also multiply total by the value of the variable "money", which as far as I can tell is the starting balance.
So you multiply by the starting balance 360 times. If only my savings accounts worked like that! I'm not sure if the rest of the logic is correct, but for a start, try moving the
total = total * money;
to be under the line
float total = 1;
(or better yet just change from
float total = 1;
to
float total = money;
and get rid of the line
total = total * money;
altogether)
The codes you have is not evaluate. NO BENEFIT of construct loop for interest calculate!
This is not needful yet introduce much of risk for high complication
Here is codes you want for use of FUNCTIONARY encapsulate :
static void Main()
{
var interests = new List<decimal>();
foreach (string possibleBalance in Accounts.GetStartingBalances())
foreach (decimal rate in Accounts.GetRates())
{
decimal balance;
if (!decimal.TryParse(possibleBalance, NumberStyles.Currency, CultureInfo.CurrentCulture, out balance))
continue;
decimal interest = CalculateInterestAccrued(balance, rate, 12, 30);
interests.Add(interest);
}
foreach (decimal interest in interests)
Console.WriteLine(interest);
Console.ReadKey();
}
static decimal CalculateInterestAccrued(decimal principal, decimal rate, int compoundsPerYear, int years)
{
return principal * (decimal)Math.Pow((double)(1 + rate / compoundsPerYear), compoundsPerYear * years);
}
thanks,
PRASHANT :)

How do I determine the standard deviation (stddev) of a set of values?

I need to know if a number compared to a set of numbers is outside of 1 stddev from the mean, etc..
While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance...
Plus, don't never, ever, ever, compute a^2 as pow(a,2), a * a is almost certainly faster.
By far the best way of computing a standard deviation is Welford's method. My C is very rusty, but it could look something like:
public static double StandardDeviation(List<double> valueList)
{
double M = 0.0;
double S = 0.0;
int k = 1;
foreach (double value in valueList)
{
double tmpM = M;
M += (value - tmpM) / k;
S += (value - tmpM) * (value - M);
k++;
}
return Math.Sqrt(S / (k-2));
}
If you have the whole population (as opposed to a sample population), then use return Math.Sqrt(S / (k-1));.
EDIT: I've updated the code according to Jason's remarks...
EDIT: I've also updated the code according to Alex's remarks...
10 times faster solution than Jaime's, but be aware that,
as Jaime pointed out:
"While the sum of squares algorithm works fine most of the time, it
can cause big trouble if you are dealing with very large numbers. You
basically may end up with a negative variance"
If you think you are dealing with very large numbers or a very large quantity of numbers, you should calculate using both methods, if the results are equal, you know for sure that you can use "my" method for your case.
public static double StandardDeviation(double[] data)
{
double stdDev = 0;
double sumAll = 0;
double sumAllQ = 0;
//Sum of x and sum of x²
for (int i = 0; i < data.Length; i++)
{
double x = data[i];
sumAll += x;
sumAllQ += x * x;
}
//Mean (not used here)
//double mean = 0;
//mean = sumAll / (double)data.Length;
//Standard deviation
stdDev = System.Math.Sqrt(
(sumAllQ -
(sumAll * sumAll) / data.Length) *
(1.0d / (data.Length - 1))
);
return stdDev;
}
The accepted answer by Jaime is great, except you need to divide by k-2 in the last line (you need to divide by "number_of_elements-1").
Better yet, start k at 0:
public static double StandardDeviation(List<double> valueList)
{
double M = 0.0;
double S = 0.0;
int k = 0;
foreach (double value in valueList)
{
k++;
double tmpM = M;
M += (value - tmpM) / k;
S += (value - tmpM) * (value - M);
}
return Math.Sqrt(S / (k-1));
}
The Math.NET library provides this for you to of the box.
PM> Install-Package MathNet.Numerics
var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation();
var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();
See PopulationStandardDeviation for more information.
Code snippet:
public static double StandardDeviation(List<double> valueList)
{
if (valueList.Count < 2) return 0.0;
double sumOfSquares = 0.0;
double average = valueList.Average(); //.NET 3.0
foreach (double value in valueList)
{
sumOfSquares += Math.Pow((value - average), 2);
}
return Math.Sqrt(sumOfSquares / (valueList.Count - 1));
}
You can avoid making two passes over the data by accumulating the mean and mean-square
cnt = 0
mean = 0
meansqr = 0
loop over array
cnt++
mean += value
meansqr += value*value
mean /= cnt
meansqr /= cnt
and forming
sigma = sqrt(meansqr - mean^2)
A factor of cnt/(cnt-1) is often appropriate as well.
BTW-- The first pass over the data in Demi and McWafflestix answers are hidden in the calls to Average. That kind of thing is certainly trivial on a small list, but if the list exceed the size of the cache, or even the working set, this gets to be a bid deal.
I found that Rob's helpful answer didn't quite match what I was seeing using excel. To match excel, I passed the Average for valueList in to the StandardDeviation calculation.
Here is my two cents... and clearly you could calculate the moving average (ma) from valueList inside the function - but I happen to have already before needing the standardDeviation.
public double StandardDeviation(List<double> valueList, double ma)
{
double xMinusMovAvg = 0.0;
double Sigma = 0.0;
int k = valueList.Count;
foreach (double value in valueList){
xMinusMovAvg = value - ma;
Sigma = Sigma + (xMinusMovAvg * xMinusMovAvg);
}
return Math.Sqrt(Sigma / (k - 1));
}
With Extension methods.
using System;
using System.Collections.Generic;
namespace SampleApp
{
internal class Program
{
private static void Main()
{
List<double> data = new List<double> {1, 2, 3, 4, 5, 6};
double mean = data.Mean();
double variance = data.Variance();
double sd = data.StandardDeviation();
Console.WriteLine("Mean: {0}, Variance: {1}, SD: {2}", mean, variance, sd);
Console.WriteLine("Press any key to continue...");
Console.ReadKey();
}
}
public static class MyListExtensions
{
public static double Mean(this List<double> values)
{
return values.Count == 0 ? 0 : values.Mean(0, values.Count);
}
public static double Mean(this List<double> values, int start, int end)
{
double s = 0;
for (int i = start; i < end; i++)
{
s += values[i];
}
return s / (end - start);
}
public static double Variance(this List<double> values)
{
return values.Variance(values.Mean(), 0, values.Count);
}
public static double Variance(this List<double> values, double mean)
{
return values.Variance(mean, 0, values.Count);
}
public static double Variance(this List<double> values, double mean, int start, int end)
{
double variance = 0;
for (int i = start; i < end; i++)
{
variance += Math.Pow((values[i] - mean), 2);
}
int n = end - start;
if (start > 0) n -= 1;
return variance / (n);
}
public static double StandardDeviation(this List<double> values)
{
return values.Count == 0 ? 0 : values.StandardDeviation(0, values.Count);
}
public static double StandardDeviation(this List<double> values, int start, int end)
{
double mean = values.Mean(start, end);
double variance = values.Variance(mean, start, end);
return Math.Sqrt(variance);
}
}
}
/// <summary>
/// Calculates standard deviation, same as MATLAB std(X,0) function
/// <seealso cref="http://www.mathworks.co.uk/help/techdoc/ref/std.html"/>
/// </summary>
/// <param name="values">enumumerable data</param>
/// <returns>Standard deviation</returns>
public static double GetStandardDeviation(this IEnumerable<double> values)
{
//validation
if (values == null)
throw new ArgumentNullException();
int lenght = values.Count();
//saves from devision by 0
if (lenght == 0 || lenght == 1)
return 0;
double sum = 0.0, sum2 = 0.0;
for (int i = 0; i < lenght; i++)
{
double item = values.ElementAt(i);
sum += item;
sum2 += item * item;
}
return Math.Sqrt((sum2 - sum * sum / lenght) / (lenght - 1));
}
The trouble with all the other answers is that they assume you have your
data in a big array. If your data is coming in on the fly, this would be
a better approach. This class works regardless of how or if you store your data. It also gives you the choice of the Waldorf method or the sum-of-squares method. Both methods work using a single pass.
public final class StatMeasure {
private StatMeasure() {}
public interface Stats1D {
/** Add a value to the population */
void addValue(double value);
/** Get the mean of all the added values */
double getMean();
/** Get the standard deviation from a sample of the population. */
double getStDevSample();
/** Gets the standard deviation for the entire population. */
double getStDevPopulation();
}
private static class WaldorfPopulation implements Stats1D {
private double mean = 0.0;
private double sSum = 0.0;
private int count = 0;
#Override
public void addValue(double value) {
double tmpMean = mean;
double delta = value - tmpMean;
mean += delta / ++count;
sSum += delta * (value - mean);
}
#Override
public double getMean() { return mean; }
#Override
public double getStDevSample() { return Math.sqrt(sSum / (count - 1)); }
#Override
public double getStDevPopulation() { return Math.sqrt(sSum / (count)); }
}
private static class StandardPopulation implements Stats1D {
private double sum = 0.0;
private double sumOfSquares = 0.0;
private int count = 0;
#Override
public void addValue(double value) {
sum += value;
sumOfSquares += value * value;
count++;
}
#Override
public double getMean() { return sum / count; }
#Override
public double getStDevSample() {
return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / (count - 1));
}
#Override
public double getStDevPopulation() {
return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / count);
}
}
/**
* Returns a way to measure a population of data using Waldorf's method.
* This method is better if your population or values are so large that
* the sum of x-squared may overflow. It's also probably faster if you
* need to recalculate the mean and standard deviation continuously,
* for example, if you are continually updating a graphic of the data as
* it flows in.
*
* #return A Stats1D object that uses Waldorf's method.
*/
public static Stats1D getWaldorfStats() { return new WaldorfPopulation(); }
/**
* Return a way to measure the population of data using the sum-of-squares
* method. This is probably faster than Waldorf's method, but runs the
* risk of data overflow.
*
* #return A Stats1D object that uses the sum-of-squares method
*/
public static Stats1D getSumOfSquaresStats() { return new StandardPopulation(); }
}
We may be able to use statistics module in Python. It has stedev() and pstdev() commands to calculate standard deviation of sample and population respectively.
details here: https://www.geeksforgeeks.org/python-statistics-stdev/
import statistics as st
print(st.ptdev(dataframe['column name']))
This is Population standard deviation
private double calculateStdDev(List<double> values)
{
double average = values.Average();
return Math.Sqrt((values.Select(val => (val - average) * (val - average)).Sum()) / values.Count);
}
For Sample standard deviation, just change [values.Count] to [values.Count -1] in above code.
Make sure you don't have only 1 data point in your set.

How do I calculate a trendline for a graph?

Google is not being my friend - it's been a long time since my stats class in college...I need to calculate the start and end points for a trendline on a graph - is there an easy way to do this? (working in C# but whatever language works for you)
Thanks to all for your help - I was off this issue for a couple of days and just came back to it - was able to cobble this together - not the most elegant code, but it works for my purposes - thought I'd share if anyone else encounters this issue:
public class Statistics
{
public Trendline CalculateLinearRegression(int[] values)
{
var yAxisValues = new List<int>();
var xAxisValues = new List<int>();
for (int i = 0; i < values.Length; i++)
{
yAxisValues.Add(values[i]);
xAxisValues.Add(i + 1);
}
return new Trendline(yAxisValues, xAxisValues);
}
}
public class Trendline
{
private readonly IList<int> xAxisValues;
private readonly IList<int> yAxisValues;
private int count;
private int xAxisValuesSum;
private int xxSum;
private int xySum;
private int yAxisValuesSum;
public Trendline(IList<int> yAxisValues, IList<int> xAxisValues)
{
this.yAxisValues = yAxisValues;
this.xAxisValues = xAxisValues;
this.Initialize();
}
public int Slope { get; private set; }
public int Intercept { get; private set; }
public int Start { get; private set; }
public int End { get; private set; }
private void Initialize()
{
this.count = this.yAxisValues.Count;
this.yAxisValuesSum = this.yAxisValues.Sum();
this.xAxisValuesSum = this.xAxisValues.Sum();
this.xxSum = 0;
this.xySum = 0;
for (int i = 0; i < this.count; i++)
{
this.xySum += (this.xAxisValues[i]*this.yAxisValues[i]);
this.xxSum += (this.xAxisValues[i]*this.xAxisValues[i]);
}
this.Slope = this.CalculateSlope();
this.Intercept = this.CalculateIntercept();
this.Start = this.CalculateStart();
this.End = this.CalculateEnd();
}
private int CalculateSlope()
{
try
{
return ((this.count*this.xySum) - (this.xAxisValuesSum*this.yAxisValuesSum))/((this.count*this.xxSum) - (this.xAxisValuesSum*this.xAxisValuesSum));
}
catch (DivideByZeroException)
{
return 0;
}
}
private int CalculateIntercept()
{
return (this.yAxisValuesSum - (this.Slope*this.xAxisValuesSum))/this.count;
}
private int CalculateStart()
{
return (this.Slope*this.xAxisValues.First()) + this.Intercept;
}
private int CalculateEnd()
{
return (this.Slope*this.xAxisValues.Last()) + this.Intercept;
}
}
OK, here's my best pseudo math:
The equation for your line is:
Y = a + bX
Where:
b = (sum(x*y) - sum(x)sum(y)/n) / (sum(x^2) - sum(x)^2/n)
a = sum(y)/n - b(sum(x)/n)
Where sum(xy) is the sum of all x*y etc. Not particularly clear I concede, but it's the best I can do without a sigma symbol :)
... and now with added Sigma
b = (Σ(xy) - (ΣxΣy)/n) / (Σ(x^2) - (Σx)^2/n)
a = (Σy)/n - b((Σx)/n)
Where Σ(xy) is the sum of all x*y etc. and n is the number of points
Given that the trendline is straight, find the slope by choosing any two points and calculating:
(A) slope = (y1-y2)/(x1-x2)
Then you need to find the offset for the line. The line is specified by the equation:
(B) y = offset + slope*x
So you need to solve for offset. Pick any point on the line, and solve for offset:
(C) offset = y - (slope*x)
Now you can plug slope and offset into the line equation (B) and have the equation that defines your line. If your line has noise you'll have to decide on an averaging algorithm, or use curve fitting of some sort.
If your line isn't straight then you'll need to look into Curve fitting, or Least Squares Fitting - non trivial, but do-able. You'll see the various types of curve fitting at the bottom of the least squares fitting webpage (exponential, polynomial, etc) if you know what kind of fit you'd like.
Also, if this is a one-off, use Excel.
Here is a very quick (and semi-dirty) implementation of Bedwyr Humphreys's answer. The interface should be compatible with #matt's answer as well, but uses decimal instead of int and uses more IEnumerable concepts to hopefully make it easier to use and read.
Slope is b, Intercept is a
public class Trendline
{
public Trendline(IList<decimal> yAxisValues, IList<decimal> xAxisValues)
: this(yAxisValues.Select((t, i) => new Tuple<decimal, decimal>(xAxisValues[i], t)))
{ }
public Trendline(IEnumerable<Tuple<Decimal, Decimal>> data)
{
var cachedData = data.ToList();
var n = cachedData.Count;
var sumX = cachedData.Sum(x => x.Item1);
var sumX2 = cachedData.Sum(x => x.Item1 * x.Item1);
var sumY = cachedData.Sum(x => x.Item2);
var sumXY = cachedData.Sum(x => x.Item1 * x.Item2);
//b = (sum(x*y) - sum(x)sum(y)/n)
// / (sum(x^2) - sum(x)^2/n)
Slope = (sumXY - ((sumX * sumY) / n))
/ (sumX2 - (sumX * sumX / n));
//a = sum(y)/n - b(sum(x)/n)
Intercept = (sumY / n) - (Slope * (sumX / n));
Start = GetYValue(cachedData.Min(a => a.Item1));
End = GetYValue(cachedData.Max(a => a.Item1));
}
public decimal Slope { get; private set; }
public decimal Intercept { get; private set; }
public decimal Start { get; private set; }
public decimal End { get; private set; }
public decimal GetYValue(decimal xValue)
{
return Intercept + Slope * xValue;
}
}
Regarding a previous answer
if (B) y = offset + slope*x
then (C) offset = y/(slope*x) is wrong
(C) should be:
offset = y-(slope*x)
See:
http://zedgraph.org/wiki/index.php?title=Trend
If you have access to Excel, look in the "Statistical Functions" section of the Function Reference within Help. For straight-line best-fit, you need SLOPE and INTERCEPT and the equations are right there.
Oh, hang on, they're also defined online here: http://office.microsoft.com/en-us/excel/HP052092641033.aspx for SLOPE, and there's a link to INTERCEPT. OF course, that assumes MS don't move the page, in which case try Googling for something like "SLOPE INTERCEPT EQUATION Excel site:microsoft.com" - the link given turned out third just now.
I converted Matt's code to Java so I could use it in Android with the MPAndroidChart library. Also used double values instead of integer values:
ArrayList<Entry> yValues2 = new ArrayList<>();
ArrayList<Double > xAxisValues = new ArrayList<Double>();
ArrayList<Double> yAxisValues = new ArrayList<Double>();
for (int i = 0; i < readings.size(); i++)
{
r = readings.get(i);
yAxisValues.add(r.value);
xAxisValues.add((double)i + 1);
}
TrendLine tl = new TrendLine(yAxisValues, xAxisValues);
//Create the y values for the trend line
double currY = tl.Start;
for (int i = 0; i < readings.size(); ++ i) {
yValues2.add(new Entry(i, (float) currY));
currY = currY + tl.Slope;
}
...
public class TrendLine
{
private ArrayList<Double> xAxisValues = new ArrayList<Double>();
private ArrayList<Double> yAxisValues = new ArrayList<Double>();
private int count;
private double xAxisValuesSum;
private double xxSum;
private double xySum;
private double yAxisValuesSum;
public TrendLine(ArrayList<Double> yAxisValues, ArrayList<Double> xAxisValues)
{
this.yAxisValues = yAxisValues;
this.xAxisValues = xAxisValues;
this.Initialize();
}
public double Slope;
public double Intercept;
public double Start;
public double End;
private double getArraySum(ArrayList<Double> arr) {
double sum = 0;
for (int i = 0; i < arr.size(); ++i) {
sum = sum + arr.get(i);
}
return sum;
}
private void Initialize()
{
this.count = this.yAxisValues.size();
this.yAxisValuesSum = getArraySum(this.yAxisValues);
this.xAxisValuesSum = getArraySum(this.xAxisValues);
this.xxSum = 0;
this.xySum = 0;
for (int i = 0; i < this.count; i++)
{
this.xySum += (this.xAxisValues.get(i)*this.yAxisValues.get(i));
this.xxSum += (this.xAxisValues.get(i)*this.xAxisValues.get(i));
}
this.Slope = this.CalculateSlope();
this.Intercept = this.CalculateIntercept();
this.Start = this.CalculateStart();
this.End = this.CalculateEnd();
}
private double CalculateSlope()
{
try
{
return ((this.count*this.xySum) - (this.xAxisValuesSum*this.yAxisValuesSum))/((this.count*this.xxSum) - (this.xAxisValuesSum*this.xAxisValuesSum));
}
catch (Exception e)
{
return 0;
}
}
private double CalculateIntercept()
{
return (this.yAxisValuesSum - (this.Slope*this.xAxisValuesSum))/this.count;
}
private double CalculateStart()
{
return (this.Slope*this.xAxisValues.get(0)) + this.Intercept;
}
private double CalculateEnd()
{
return (this.Slope*this.xAxisValues.get(this.xAxisValues.size()-1)) + this.Intercept;
}
}
This is the way i calculated the slope:
Source: http://classroom.synonym.com/calculate-trendline-2709.html
class Program
{
public double CalculateTrendlineSlope(List<Point> graph)
{
int n = graph.Count;
double a = 0;
double b = 0;
double bx = 0;
double by = 0;
double c = 0;
double d = 0;
double slope = 0;
foreach (Point point in graph)
{
a += point.x * point.y;
bx = point.x;
by = point.y;
c += Math.Pow(point.x, 2);
d += point.x;
}
a *= n;
b = bx * by;
c *= n;
d = Math.Pow(d, 2);
slope = (a - b) / (c - d);
return slope;
}
}
class Point
{
public double x;
public double y;
}
Here's what I ended up using.
public class DataPoint<T1,T2>
{
public DataPoint(T1 x, T2 y)
{
X = x;
Y = y;
}
[JsonProperty("x")]
public T1 X { get; }
[JsonProperty("y")]
public T2 Y { get; }
}
public class Trendline
{
public Trendline(IEnumerable<DataPoint<long, decimal>> dataPoints)
{
int count = 0;
long sumX = 0;
long sumX2 = 0;
decimal sumY = 0;
decimal sumXY = 0;
foreach (var dataPoint in dataPoints)
{
count++;
sumX += dataPoint.X;
sumX2 += dataPoint.X * dataPoint.X;
sumY += dataPoint.Y;
sumXY += dataPoint.X * dataPoint.Y;
}
Slope = (sumXY - ((sumX * sumY) / count)) / (sumX2 - ((sumX * sumX) / count));
Intercept = (sumY / count) - (Slope * (sumX / count));
}
public decimal Slope { get; private set; }
public decimal Intercept { get; private set; }
public decimal Start { get; private set; }
public decimal End { get; private set; }
public decimal GetYValue(decimal xValue)
{
return Slope * xValue + Intercept;
}
}
My data set is using a Unix timestamp for the x-axis and a decimal for the y. Change those datatypes to fit your need. I do all the sum calculations in one iteration for the best possible performance.
Thank You so much for the solution, I was scratching my head.
Here's how I applied the solution in Excel.
I successfully used the two functions given by MUHD in Excel:
a = (sum(x*y) - sum(x)sum(y)/n) / (sum(x^2) - sum(x)^2/n)
b = sum(y)/n - b(sum(x)/n)
(careful my a and b are the b and a in MUHD's solution).
- Made 4 columns, for example:
NB: my values y values are in B3:B17, so I have n=15;
my x values are 1,2,3,4...15.
1. Column B: Known x's
2. Column C: Known y's
3. Column D: The computed trend line
4. Column E: B values * C values (E3=B3*C3, E4=B4*C4, ..., E17=B17*C17)
5. Column F: x squared values
I then sum the columns B,C and E, the sums go in line 18 for me, so I have B18 as sum of Xs, C18 as sum of Ys, E18 as sum of X*Y, and F18 as sum of squares.
To compute a, enter the followin formula in any cell (F35 for me):
F35=(E18-(B18*C18)/15)/(F18-(B18*B18)/15)
To compute b (in F36 for me):
F36=C18/15-F35*(B18/15)
Column D values, computing the trend line according to the y = ax + b:
D3=$F$35*B3+$F$36, D4=$F$35*B4+$F$36 and so on (until D17 for me).
Select the column datas (C2:D17) to make the graph.
HTH.
If anyone needs the JS code for calculating the trendline of many points on a graph, here's what worked for us in the end:
/**#typedef {{
* x: Number;
* y:Number;
* }} Point
* #param {Point[]} data
* #returns {Function} */
function _getTrendlineEq(data) {
const xySum = data.reduce((acc, item) => {
const xy = item.x * item.y
acc += xy
return acc
}, 0)
const xSum = data.reduce((acc, item) => {
acc += item.x
return acc
}, 0)
const ySum = data.reduce((acc, item) => {
acc += item.y
return acc
}, 0)
const aTop = (data.length * xySum) - (xSum * ySum)
const xSquaredSum = data.reduce((acc, item) => {
const xSquared = item.x * item.x
acc += xSquared
return acc
}, 0)
const aBottom = (data.length * xSquaredSum) - (xSum * xSum)
const a = aTop / aBottom
const bTop = ySum - (a * xSum)
const b = bTop / data.length
return function trendline(x) {
return a * x + b
}
}
It takes an array of (x,y) points and returns the function of a y given a certain x
Have fun :)
Here's a working example in golang. I searched around and found this page and converted this over to what I needed. Hope someone else can find it useful.
// https://classroom.synonym.com/calculate-trendline-2709.html
package main
import (
"fmt"
"math"
)
func main() {
graph := [][]float64{
{1, 3},
{2, 5},
{3, 6.5},
}
n := len(graph)
// get the slope
var a float64
var b float64
var bx float64
var by float64
var c float64
var d float64
var slope float64
for _, point := range graph {
a += point[0] * point[1]
bx += point[0]
by += point[1]
c += math.Pow(point[0], 2)
d += point[0]
}
a *= float64(n) // 97.5
b = bx * by // 87
c *= float64(n) // 42
d = math.Pow(d, 2) // 36
slope = (a - b) / (c - d) // 1.75
// calculating the y-intercept (b) of the Trendline
var e float64
var f float64
e = by // 14.5
f = slope * bx // 10.5
intercept := (e - f) / float64(n) // (14.5 - 10.5) / 3 = 1.3
// output
fmt.Println(slope)
fmt.Println(intercept)
}

Categories