does anyone know a way to calculate Beta (beta coefficient) for a portfolio or stock vs. a benchmark, such as an index like S&P in c#?
I already have 2 arrays of type double that would be required for such a calculation but I can't find any sleek way to do this.
StatisticFormula.BetaFunction Method (Double, Double) exists but this accepts one value for each param, not an array - which statistically makes no sense.
thanks in advance
I'm not aware of any good C# Finance/Statistics packages, so I wrote the method directly and borrowed from this stats package: https://www.codeproject.com/Articles/42492/Using-LINQ-to-Calculate-Basic-Statistics
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApplication1
{
static class Program
{
static void Main(string[] args)
{
double[] closingPriceStock = { 39.32, 39.45, 39.27, 38.73, 37.99, 38.38, 39.53, 40.55, 40.78, 41.3, 41.35, 41.25, 41.1, 41.26, 41.48, 41.68, 41.77, 41.92, 42.12, 41.85, 41.54 };
double[] closingPriceMarket = { 1972.18, 1988.87, 1987.66, 1940.51, 1867.61, 1893.21, 1970.89, 2035.73, 2079.61, 2096.92, 2102.44, 2091.54, 2083.39, 2086.05, 2084.07, 2104.18, 2077.57, 2083.56, 2099.84, 2093.32, 2098.04 };
double[] closingPriceStockDailyChange = new double[closingPriceStock.Length - 1];
double[] closingPriceMarketDailyChange = new double[closingPriceMarket.Length - 1];
for (int i = 0; i < closingPriceStockDailyChange.Length; i++)
{
closingPriceStockDailyChange[i] = (closingPriceStock[i + 1] - closingPriceStock[i]) * 100 / closingPriceStock[i];
closingPriceMarketDailyChange[i] = (closingPriceMarket[i + 1] - closingPriceMarket[i]) * 100 / closingPriceMarket[i];
}
double beta = Covariance(closingPriceStockDailyChange, closingPriceMarketDailyChange) / Variance(closingPriceMarketDailyChange);
Console.WriteLine(beta);
Console.Read();
}
public static double Variance(this IEnumerable<double> source)
{
int n = 0;
double mean = 0;
double M2 = 0;
foreach (double x in source)
{
n = n + 1;
double delta = x - mean;
mean = mean + delta / n;
M2 += delta * (x - mean);
}
return M2 / (n - 1);
}
public static double Covariance(this IEnumerable<double> source, IEnumerable<double> other)
{
int len = source.Count();
double avgSource = source.Average();
double avgOther = other.Average();
double covariance = 0;
for (int i = 0; i < len; i++)
covariance += (source.ElementAt(i) - avgSource) * (other.ElementAt(i) - avgOther);
return covariance / len;
}
}
}
This would have to be refactored to calculate beta in a function, you can import the linked package to avoid the static methods I included, etc., but this is just a toy example.
Related
I am trying to create a function to find the nth root of a number without using libraries like Math etc. I can only use the +-*/ operators.
So far I have been trying to recreate the Math.Pow(double num, double root) function but with no luck as I cannot come up with a solution that takes two doubles.
I have tried using this:
double pow(double a, int n) {
double result = 1;
for(int i = 0; i < n; i++) {
result *= a
}
return result;
But this does not really work for me because it takes an int for the root instead of a double.
I need the source code or a recreation of Math.Pow(double a, double b)
Let's adapt code from Rosetta Code (https://rosettacode.org/wiki/Nth_root):
private static double DPow(double a, int n)
{
var result = 1.0;
for (; n > 0; n--) result *= a;
return result;
}
private static double DAbs(double a)
{
return (a > 0.0) ? a : -a;
}
public static double NthRoot(double a, int n, double p)
{
var _n = (double)n;
var x0 = a;
var x1 = a / _n;
while (DAbs(x0 - x1) > p)
{
x1 = x0;
x0 = (1.0 / _n) * (((_n - 1.0) * x1) + (a / DPow(x1, _n - 1.0)));
}
return x0;
}
public static double NthRoot(double a, int n)
{
return NthRoot(a, n, .0001);
}
We implement a simple pow function, and a very simple abs function, and use both to implement the algorithm given by the Rosetta Code site. Hope this fulfils your requirements!
how can I generate a "random constant colour" for a given string at runtime?
So a given string value will always have the same colour but different strings will have different colours.
Like how gmail assigns colours to the sender names.
Thanks
Responses to comments:
Thinking to generate the colour from a hashcode.
The colours won't be stored but generated from a hashcode.
I don't know any dedicated method for this, but here is a simple method generating Hexadecimal values with MD5 based on a given string:
using System.Security.Cryptography;
using System.Text;
static string GetColor(string raw)
{
using (MD5 md5Hash = MD5.Create())
{
byte[] data = md5Hash.ComputeHash(Encoding.UTF8.GetBytes(raw));
return BitConverter.ToString(data).Replace("-", string.Empty).Substring(0, 6);
}
}
Examples:
example#example.com
-> 23463B
info#google.com
-> 3C9015
stack#exchange.com
-> 7CA5E8
Edit:
I didn't tested it enough, so you may want to tweak it a little bit to get more different and unique values.
Edit2:
If you want transparency, check out this question/answer. By setting the Substring to Substring(0,8) , you should return a string with the alpha channel.
Similar to what the other answers are suggesting (hash the string in some form then use that hash to pick the color), but instead of using the hash to directly calculate the color use it as the index to an array of "Acceptable" colors.
class ColorPicker
{
public ColorPicker(int colorCount)
{
//The ".Skip(2)" makes it skip pure white and pure black.
// If you want those two, take out the +2 and the skip.
_colors = ColorGenerator.Generate(colorCount + 2).Skip(2).ToArray();
}
private readonly Color[] _colors;
public Color StringToColor(string message)
{
int someHash = CalculateHashOfStringSomehow(message);
return _colors[someHash % _colors.Length];
}
private int CalculateHashOfStringSomehow(string message)
{
//TODO: I would not use "message.GetHashCode()" as you are not
// guaranteed the same value between runs of the program.
// Make up your own algorithom or use a existing one that has a fixed
// output for a given input, like MD5.
}
}
This prevents issues like getting a white color when you plan on showing the text with a white background and other similar problems.
To populate your Color[] see this answer for the ColorGenerator class or just make your own pre-defined list of colors that look good on whatever background they will be used on.
Appendix:
In case the link goes down, here is a copy of the ColorGenerator class
public static class ColorGenerator
{
// RYB color space
private static class RYB
{
private static readonly double[] White = { 1, 1, 1 };
private static readonly double[] Red = { 1, 0, 0 };
private static readonly double[] Yellow = { 1, 1, 0 };
private static readonly double[] Blue = { 0.163, 0.373, 0.6 };
private static readonly double[] Violet = { 0.5, 0, 0.5 };
private static readonly double[] Green = { 0, 0.66, 0.2 };
private static readonly double[] Orange = { 1, 0.5, 0 };
private static readonly double[] Black = { 0.2, 0.094, 0.0 };
public static double[] ToRgb(double r, double y, double b)
{
var rgb = new double[3];
for (int i = 0; i < 3; i++)
{
rgb[i] = White[i] * (1.0 - r) * (1.0 - b) * (1.0 - y) +
Red[i] * r * (1.0 - b) * (1.0 - y) +
Blue[i] * (1.0 - r) * b * (1.0 - y) +
Violet[i] * r * b * (1.0 - y) +
Yellow[i] * (1.0 - r) * (1.0 - b) * y +
Orange[i] * r * (1.0 - b) * y +
Green[i] * (1.0 - r) * b * y +
Black[i] * r * b * y;
}
return rgb;
}
}
private class Points : IEnumerable<double[]>
{
private readonly int pointsCount;
private double[] picked;
private int pickedCount;
private readonly List<double[]> points = new List<double[]>();
public Points(int count)
{
pointsCount = count;
}
private void Generate()
{
points.Clear();
var numBase = (int)Math.Ceiling(Math.Pow(pointsCount, 1.0 / 3.0));
var ceil = (int)Math.Pow(numBase, 3.0);
for (int i = 0; i < ceil; i++)
{
points.Add(new[]
{
Math.Floor(i/(double)(numBase*numBase))/ (numBase - 1.0),
Math.Floor((i/(double)numBase) % numBase)/ (numBase - 1.0),
Math.Floor((double)(i % numBase))/ (numBase - 1.0),
});
}
}
private double Distance(double[] p1)
{
double distance = 0;
for (int i = 0; i < 3; i++)
{
distance += Math.Pow(p1[i] - picked[i], 2.0);
}
return distance;
}
private double[] Pick()
{
if (picked == null)
{
picked = points[0];
points.RemoveAt(0);
pickedCount = 1;
return picked;
}
var d1 = Distance(points[0]);
int i1 = 0, i2 = 0;
foreach (var point in points)
{
var d2 = Distance(point);
if (d1 < d2)
{
i1 = i2;
d1 = d2;
}
i2 += 1;
}
var pick = points[i1];
points.RemoveAt(i1);
for (int i = 0; i < 3; i++)
{
picked[i] = (pickedCount * picked[i] + pick[i]) / (pickedCount + 1.0);
}
pickedCount += 1;
return pick;
}
public IEnumerator<double[]> GetEnumerator()
{
Generate();
for (int i = 0; i < pointsCount; i++)
{
yield return Pick();
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
public static IEnumerable<Color> Generate(int numOfColors)
{
var points = new Points(numOfColors);
foreach (var point in points)
{
var rgb = RYB.ToRgb(point[0], point[1], point[2]);
yield return Color.FromArgb(
(int)Math.Floor(255 * rgb[0]),
(int)Math.Floor(255 * rgb[1]),
(int)Math.Floor(255 * rgb[2]));
}
}
}
3 integer variables, r,g and b.
Loop through each character in the string in steps of 3 and add the character code.
r += n + 0
g += n + 1
b += n + 2
after the loop take r,g, and b modulo 255 and create a color using Color.FromARGB.
No guarantees the color will be pretty though, and some strings may happen to have colors very close to each other.
I see some pretty good answeers but though it whould contribute with a little fun solutuion to generate colors from string, the Hash version looks like the best way to go but if this gives any one some inspiration to bould off, have at it
ConsoleKeyInfo ch = new ConsoleKeyInfo();
while (ch.KeyChar != 'e')
{
Console.WriteLine("type string to seed color");
string s = Console.ReadLine(); // gets text from input, in this case the command line
double d=0;
foreach(char cha in s.ToCharArray())
{
d=+ (int)cha; // get the value and adds it
}
d= (255/(Math.Pow(0.2,-0.002 *d))); // Generates a seed like value from i where 255 is the maximum. basicly 255/0.2^(-0.002*d)
int i = Convert.ToInt32(d); //then convets and get rid of the decimels
Color c = Color.FromArgb(i, i, i);// add a bit more calculation for varieng colers.
Console.WriteLine(c.Name);
Console.WriteLine("To Exit press e");
ch = Console.ReadKey()
}
Edit1: It definantly needs some tweeking, since the longer the string the ligther the color, but i think something can come from it with a little work :)
Assume that I want to get sum of all squares from M to N. I googled a bit and found this formula:
(1^2 + 2^2 + 3^2 + ... + N^2) = (N * (N + 1) * (2N + 1)) / 6
so I write this code:
static void Main(string[] args)
{
const int from = 10;
const int to = 50000;
Console.WriteLine(SumSquares(from, to));
Console.WriteLine(SumSquares2(from, to));
}
static long SumSquares(int m, int n)
{
checked
{
long x = m - 1;
long y = n;
return (((y*(y + 1)*(2*y + 1)) - (x*(x + 1)*(2*x + 1)))/6);
}
}
static long SumSquares2(int m, int n)
{
long sum = 0;
for (int i = m; i <= n; ++i)
{
sum += i * i;
}
return sum;
}
it works fine until 40k, but when N becomes 50k it fails. Output for 50k:
41667916674715
25948336371355
Press any key to continue . . .
I think it's an overflow or something, so I added checked keyword and tried to change long to double, but I got the same result. How can it be explained? How to get correct result without loops?
Your second method is overflowing because you are using an int in the loop. Change it to a long as follows (and also add checked):
static long SumSquares2(int m, int n)
{
checked
{
long sum = 0;
for (long i = m; i <= n; ++i)
{
sum += i*i;
}
return sum;
}
}
What was going wrong is that i*i was being calculated internally as an int data type even though the result was being cast to a long data type (i.e. the variable sum), and so it overflowed.
While you are using long for the result, you are still using int for the operators. I would define M and N as long or even BigInteger, and the same for the result. If you do not, you are probably doing int arithmetic still, even though your result is of type long.
I tried your code, and got the results you got. But then I changed every int to long and got the two numbers to match, up to an N of 1600000.
Using BigInteger, I am up to 160000000 and still working ok (result for m=10 and n=160000000 is 13653333461333333359999715, both ways).
To use BigInteger, you will need to add a reference to the System.Numerics dll to your project, and you will need to have a statement at the top of your code including that library.
using System.Numerics;
namespace ConsoleFiddle
{
class Program
{
static void Main(string[] args)
{
BigInteger from = 10;
BigInteger to = 160000000;
Console.WriteLine(SumSquares(from, to));
Console.WriteLine(SumSquares2(from, to));
Console.ReadKey();
}
static BigInteger SumSquares(BigInteger m, BigInteger n)
{
checked
{
BigInteger x = m - 1;
BigInteger y = n;
return (((y * (y + 1) * (2 * y + 1)) - (x * (x + 1) * (2 * x + 1))) / 6);
}
}
static BigInteger SumSquares2(BigInteger m, BigInteger n)
{
checked
{
BigInteger sum = 0;
for (BigInteger i = m; i <= n; ++i)
{
sum += i * i;
}
return sum;
}
}
For an M of 4000000000000000000 (4 x 10^18), and an N of 4000000000100000000. This code still works and gives an immediate result with the first method (1600000016040000000400333333338333333350000000). With the second method it takes it a little while (100 million loop iterations) but gives the same result.
Most probably you are experiencing integer overflow, as the range of long is limited. Probably you have disabled exceptions for integer overflow, so no exception is thrown. The exceptions for integer overflow can be disabled and enabled in the project properties in Visual Studio, if I'm not mistaken.
I need to know if a number compared to a set of numbers is outside of 1 stddev from the mean, etc..
While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance...
Plus, don't never, ever, ever, compute a^2 as pow(a,2), a * a is almost certainly faster.
By far the best way of computing a standard deviation is Welford's method. My C is very rusty, but it could look something like:
public static double StandardDeviation(List<double> valueList)
{
double M = 0.0;
double S = 0.0;
int k = 1;
foreach (double value in valueList)
{
double tmpM = M;
M += (value - tmpM) / k;
S += (value - tmpM) * (value - M);
k++;
}
return Math.Sqrt(S / (k-2));
}
If you have the whole population (as opposed to a sample population), then use return Math.Sqrt(S / (k-1));.
EDIT: I've updated the code according to Jason's remarks...
EDIT: I've also updated the code according to Alex's remarks...
10 times faster solution than Jaime's, but be aware that,
as Jaime pointed out:
"While the sum of squares algorithm works fine most of the time, it
can cause big trouble if you are dealing with very large numbers. You
basically may end up with a negative variance"
If you think you are dealing with very large numbers or a very large quantity of numbers, you should calculate using both methods, if the results are equal, you know for sure that you can use "my" method for your case.
public static double StandardDeviation(double[] data)
{
double stdDev = 0;
double sumAll = 0;
double sumAllQ = 0;
//Sum of x and sum of x²
for (int i = 0; i < data.Length; i++)
{
double x = data[i];
sumAll += x;
sumAllQ += x * x;
}
//Mean (not used here)
//double mean = 0;
//mean = sumAll / (double)data.Length;
//Standard deviation
stdDev = System.Math.Sqrt(
(sumAllQ -
(sumAll * sumAll) / data.Length) *
(1.0d / (data.Length - 1))
);
return stdDev;
}
The accepted answer by Jaime is great, except you need to divide by k-2 in the last line (you need to divide by "number_of_elements-1").
Better yet, start k at 0:
public static double StandardDeviation(List<double> valueList)
{
double M = 0.0;
double S = 0.0;
int k = 0;
foreach (double value in valueList)
{
k++;
double tmpM = M;
M += (value - tmpM) / k;
S += (value - tmpM) * (value - M);
}
return Math.Sqrt(S / (k-1));
}
The Math.NET library provides this for you to of the box.
PM> Install-Package MathNet.Numerics
var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation();
var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();
See PopulationStandardDeviation for more information.
Code snippet:
public static double StandardDeviation(List<double> valueList)
{
if (valueList.Count < 2) return 0.0;
double sumOfSquares = 0.0;
double average = valueList.Average(); //.NET 3.0
foreach (double value in valueList)
{
sumOfSquares += Math.Pow((value - average), 2);
}
return Math.Sqrt(sumOfSquares / (valueList.Count - 1));
}
You can avoid making two passes over the data by accumulating the mean and mean-square
cnt = 0
mean = 0
meansqr = 0
loop over array
cnt++
mean += value
meansqr += value*value
mean /= cnt
meansqr /= cnt
and forming
sigma = sqrt(meansqr - mean^2)
A factor of cnt/(cnt-1) is often appropriate as well.
BTW-- The first pass over the data in Demi and McWafflestix answers are hidden in the calls to Average. That kind of thing is certainly trivial on a small list, but if the list exceed the size of the cache, or even the working set, this gets to be a bid deal.
I found that Rob's helpful answer didn't quite match what I was seeing using excel. To match excel, I passed the Average for valueList in to the StandardDeviation calculation.
Here is my two cents... and clearly you could calculate the moving average (ma) from valueList inside the function - but I happen to have already before needing the standardDeviation.
public double StandardDeviation(List<double> valueList, double ma)
{
double xMinusMovAvg = 0.0;
double Sigma = 0.0;
int k = valueList.Count;
foreach (double value in valueList){
xMinusMovAvg = value - ma;
Sigma = Sigma + (xMinusMovAvg * xMinusMovAvg);
}
return Math.Sqrt(Sigma / (k - 1));
}
With Extension methods.
using System;
using System.Collections.Generic;
namespace SampleApp
{
internal class Program
{
private static void Main()
{
List<double> data = new List<double> {1, 2, 3, 4, 5, 6};
double mean = data.Mean();
double variance = data.Variance();
double sd = data.StandardDeviation();
Console.WriteLine("Mean: {0}, Variance: {1}, SD: {2}", mean, variance, sd);
Console.WriteLine("Press any key to continue...");
Console.ReadKey();
}
}
public static class MyListExtensions
{
public static double Mean(this List<double> values)
{
return values.Count == 0 ? 0 : values.Mean(0, values.Count);
}
public static double Mean(this List<double> values, int start, int end)
{
double s = 0;
for (int i = start; i < end; i++)
{
s += values[i];
}
return s / (end - start);
}
public static double Variance(this List<double> values)
{
return values.Variance(values.Mean(), 0, values.Count);
}
public static double Variance(this List<double> values, double mean)
{
return values.Variance(mean, 0, values.Count);
}
public static double Variance(this List<double> values, double mean, int start, int end)
{
double variance = 0;
for (int i = start; i < end; i++)
{
variance += Math.Pow((values[i] - mean), 2);
}
int n = end - start;
if (start > 0) n -= 1;
return variance / (n);
}
public static double StandardDeviation(this List<double> values)
{
return values.Count == 0 ? 0 : values.StandardDeviation(0, values.Count);
}
public static double StandardDeviation(this List<double> values, int start, int end)
{
double mean = values.Mean(start, end);
double variance = values.Variance(mean, start, end);
return Math.Sqrt(variance);
}
}
}
/// <summary>
/// Calculates standard deviation, same as MATLAB std(X,0) function
/// <seealso cref="http://www.mathworks.co.uk/help/techdoc/ref/std.html"/>
/// </summary>
/// <param name="values">enumumerable data</param>
/// <returns>Standard deviation</returns>
public static double GetStandardDeviation(this IEnumerable<double> values)
{
//validation
if (values == null)
throw new ArgumentNullException();
int lenght = values.Count();
//saves from devision by 0
if (lenght == 0 || lenght == 1)
return 0;
double sum = 0.0, sum2 = 0.0;
for (int i = 0; i < lenght; i++)
{
double item = values.ElementAt(i);
sum += item;
sum2 += item * item;
}
return Math.Sqrt((sum2 - sum * sum / lenght) / (lenght - 1));
}
The trouble with all the other answers is that they assume you have your
data in a big array. If your data is coming in on the fly, this would be
a better approach. This class works regardless of how or if you store your data. It also gives you the choice of the Waldorf method or the sum-of-squares method. Both methods work using a single pass.
public final class StatMeasure {
private StatMeasure() {}
public interface Stats1D {
/** Add a value to the population */
void addValue(double value);
/** Get the mean of all the added values */
double getMean();
/** Get the standard deviation from a sample of the population. */
double getStDevSample();
/** Gets the standard deviation for the entire population. */
double getStDevPopulation();
}
private static class WaldorfPopulation implements Stats1D {
private double mean = 0.0;
private double sSum = 0.0;
private int count = 0;
#Override
public void addValue(double value) {
double tmpMean = mean;
double delta = value - tmpMean;
mean += delta / ++count;
sSum += delta * (value - mean);
}
#Override
public double getMean() { return mean; }
#Override
public double getStDevSample() { return Math.sqrt(sSum / (count - 1)); }
#Override
public double getStDevPopulation() { return Math.sqrt(sSum / (count)); }
}
private static class StandardPopulation implements Stats1D {
private double sum = 0.0;
private double sumOfSquares = 0.0;
private int count = 0;
#Override
public void addValue(double value) {
sum += value;
sumOfSquares += value * value;
count++;
}
#Override
public double getMean() { return sum / count; }
#Override
public double getStDevSample() {
return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / (count - 1));
}
#Override
public double getStDevPopulation() {
return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / count);
}
}
/**
* Returns a way to measure a population of data using Waldorf's method.
* This method is better if your population or values are so large that
* the sum of x-squared may overflow. It's also probably faster if you
* need to recalculate the mean and standard deviation continuously,
* for example, if you are continually updating a graphic of the data as
* it flows in.
*
* #return A Stats1D object that uses Waldorf's method.
*/
public static Stats1D getWaldorfStats() { return new WaldorfPopulation(); }
/**
* Return a way to measure the population of data using the sum-of-squares
* method. This is probably faster than Waldorf's method, but runs the
* risk of data overflow.
*
* #return A Stats1D object that uses the sum-of-squares method
*/
public static Stats1D getSumOfSquaresStats() { return new StandardPopulation(); }
}
We may be able to use statistics module in Python. It has stedev() and pstdev() commands to calculate standard deviation of sample and population respectively.
details here: https://www.geeksforgeeks.org/python-statistics-stdev/
import statistics as st
print(st.ptdev(dataframe['column name']))
This is Population standard deviation
private double calculateStdDev(List<double> values)
{
double average = values.Average();
return Math.Sqrt((values.Select(val => (val - average) * (val - average)).Sum()) / values.Count);
}
For Sample standard deviation, just change [values.Count] to [values.Count -1] in above code.
Make sure you don't have only 1 data point in your set.
Google is not being my friend - it's been a long time since my stats class in college...I need to calculate the start and end points for a trendline on a graph - is there an easy way to do this? (working in C# but whatever language works for you)
Thanks to all for your help - I was off this issue for a couple of days and just came back to it - was able to cobble this together - not the most elegant code, but it works for my purposes - thought I'd share if anyone else encounters this issue:
public class Statistics
{
public Trendline CalculateLinearRegression(int[] values)
{
var yAxisValues = new List<int>();
var xAxisValues = new List<int>();
for (int i = 0; i < values.Length; i++)
{
yAxisValues.Add(values[i]);
xAxisValues.Add(i + 1);
}
return new Trendline(yAxisValues, xAxisValues);
}
}
public class Trendline
{
private readonly IList<int> xAxisValues;
private readonly IList<int> yAxisValues;
private int count;
private int xAxisValuesSum;
private int xxSum;
private int xySum;
private int yAxisValuesSum;
public Trendline(IList<int> yAxisValues, IList<int> xAxisValues)
{
this.yAxisValues = yAxisValues;
this.xAxisValues = xAxisValues;
this.Initialize();
}
public int Slope { get; private set; }
public int Intercept { get; private set; }
public int Start { get; private set; }
public int End { get; private set; }
private void Initialize()
{
this.count = this.yAxisValues.Count;
this.yAxisValuesSum = this.yAxisValues.Sum();
this.xAxisValuesSum = this.xAxisValues.Sum();
this.xxSum = 0;
this.xySum = 0;
for (int i = 0; i < this.count; i++)
{
this.xySum += (this.xAxisValues[i]*this.yAxisValues[i]);
this.xxSum += (this.xAxisValues[i]*this.xAxisValues[i]);
}
this.Slope = this.CalculateSlope();
this.Intercept = this.CalculateIntercept();
this.Start = this.CalculateStart();
this.End = this.CalculateEnd();
}
private int CalculateSlope()
{
try
{
return ((this.count*this.xySum) - (this.xAxisValuesSum*this.yAxisValuesSum))/((this.count*this.xxSum) - (this.xAxisValuesSum*this.xAxisValuesSum));
}
catch (DivideByZeroException)
{
return 0;
}
}
private int CalculateIntercept()
{
return (this.yAxisValuesSum - (this.Slope*this.xAxisValuesSum))/this.count;
}
private int CalculateStart()
{
return (this.Slope*this.xAxisValues.First()) + this.Intercept;
}
private int CalculateEnd()
{
return (this.Slope*this.xAxisValues.Last()) + this.Intercept;
}
}
OK, here's my best pseudo math:
The equation for your line is:
Y = a + bX
Where:
b = (sum(x*y) - sum(x)sum(y)/n) / (sum(x^2) - sum(x)^2/n)
a = sum(y)/n - b(sum(x)/n)
Where sum(xy) is the sum of all x*y etc. Not particularly clear I concede, but it's the best I can do without a sigma symbol :)
... and now with added Sigma
b = (Σ(xy) - (ΣxΣy)/n) / (Σ(x^2) - (Σx)^2/n)
a = (Σy)/n - b((Σx)/n)
Where Σ(xy) is the sum of all x*y etc. and n is the number of points
Given that the trendline is straight, find the slope by choosing any two points and calculating:
(A) slope = (y1-y2)/(x1-x2)
Then you need to find the offset for the line. The line is specified by the equation:
(B) y = offset + slope*x
So you need to solve for offset. Pick any point on the line, and solve for offset:
(C) offset = y - (slope*x)
Now you can plug slope and offset into the line equation (B) and have the equation that defines your line. If your line has noise you'll have to decide on an averaging algorithm, or use curve fitting of some sort.
If your line isn't straight then you'll need to look into Curve fitting, or Least Squares Fitting - non trivial, but do-able. You'll see the various types of curve fitting at the bottom of the least squares fitting webpage (exponential, polynomial, etc) if you know what kind of fit you'd like.
Also, if this is a one-off, use Excel.
Here is a very quick (and semi-dirty) implementation of Bedwyr Humphreys's answer. The interface should be compatible with #matt's answer as well, but uses decimal instead of int and uses more IEnumerable concepts to hopefully make it easier to use and read.
Slope is b, Intercept is a
public class Trendline
{
public Trendline(IList<decimal> yAxisValues, IList<decimal> xAxisValues)
: this(yAxisValues.Select((t, i) => new Tuple<decimal, decimal>(xAxisValues[i], t)))
{ }
public Trendline(IEnumerable<Tuple<Decimal, Decimal>> data)
{
var cachedData = data.ToList();
var n = cachedData.Count;
var sumX = cachedData.Sum(x => x.Item1);
var sumX2 = cachedData.Sum(x => x.Item1 * x.Item1);
var sumY = cachedData.Sum(x => x.Item2);
var sumXY = cachedData.Sum(x => x.Item1 * x.Item2);
//b = (sum(x*y) - sum(x)sum(y)/n)
// / (sum(x^2) - sum(x)^2/n)
Slope = (sumXY - ((sumX * sumY) / n))
/ (sumX2 - (sumX * sumX / n));
//a = sum(y)/n - b(sum(x)/n)
Intercept = (sumY / n) - (Slope * (sumX / n));
Start = GetYValue(cachedData.Min(a => a.Item1));
End = GetYValue(cachedData.Max(a => a.Item1));
}
public decimal Slope { get; private set; }
public decimal Intercept { get; private set; }
public decimal Start { get; private set; }
public decimal End { get; private set; }
public decimal GetYValue(decimal xValue)
{
return Intercept + Slope * xValue;
}
}
Regarding a previous answer
if (B) y = offset + slope*x
then (C) offset = y/(slope*x) is wrong
(C) should be:
offset = y-(slope*x)
See:
http://zedgraph.org/wiki/index.php?title=Trend
If you have access to Excel, look in the "Statistical Functions" section of the Function Reference within Help. For straight-line best-fit, you need SLOPE and INTERCEPT and the equations are right there.
Oh, hang on, they're also defined online here: http://office.microsoft.com/en-us/excel/HP052092641033.aspx for SLOPE, and there's a link to INTERCEPT. OF course, that assumes MS don't move the page, in which case try Googling for something like "SLOPE INTERCEPT EQUATION Excel site:microsoft.com" - the link given turned out third just now.
I converted Matt's code to Java so I could use it in Android with the MPAndroidChart library. Also used double values instead of integer values:
ArrayList<Entry> yValues2 = new ArrayList<>();
ArrayList<Double > xAxisValues = new ArrayList<Double>();
ArrayList<Double> yAxisValues = new ArrayList<Double>();
for (int i = 0; i < readings.size(); i++)
{
r = readings.get(i);
yAxisValues.add(r.value);
xAxisValues.add((double)i + 1);
}
TrendLine tl = new TrendLine(yAxisValues, xAxisValues);
//Create the y values for the trend line
double currY = tl.Start;
for (int i = 0; i < readings.size(); ++ i) {
yValues2.add(new Entry(i, (float) currY));
currY = currY + tl.Slope;
}
...
public class TrendLine
{
private ArrayList<Double> xAxisValues = new ArrayList<Double>();
private ArrayList<Double> yAxisValues = new ArrayList<Double>();
private int count;
private double xAxisValuesSum;
private double xxSum;
private double xySum;
private double yAxisValuesSum;
public TrendLine(ArrayList<Double> yAxisValues, ArrayList<Double> xAxisValues)
{
this.yAxisValues = yAxisValues;
this.xAxisValues = xAxisValues;
this.Initialize();
}
public double Slope;
public double Intercept;
public double Start;
public double End;
private double getArraySum(ArrayList<Double> arr) {
double sum = 0;
for (int i = 0; i < arr.size(); ++i) {
sum = sum + arr.get(i);
}
return sum;
}
private void Initialize()
{
this.count = this.yAxisValues.size();
this.yAxisValuesSum = getArraySum(this.yAxisValues);
this.xAxisValuesSum = getArraySum(this.xAxisValues);
this.xxSum = 0;
this.xySum = 0;
for (int i = 0; i < this.count; i++)
{
this.xySum += (this.xAxisValues.get(i)*this.yAxisValues.get(i));
this.xxSum += (this.xAxisValues.get(i)*this.xAxisValues.get(i));
}
this.Slope = this.CalculateSlope();
this.Intercept = this.CalculateIntercept();
this.Start = this.CalculateStart();
this.End = this.CalculateEnd();
}
private double CalculateSlope()
{
try
{
return ((this.count*this.xySum) - (this.xAxisValuesSum*this.yAxisValuesSum))/((this.count*this.xxSum) - (this.xAxisValuesSum*this.xAxisValuesSum));
}
catch (Exception e)
{
return 0;
}
}
private double CalculateIntercept()
{
return (this.yAxisValuesSum - (this.Slope*this.xAxisValuesSum))/this.count;
}
private double CalculateStart()
{
return (this.Slope*this.xAxisValues.get(0)) + this.Intercept;
}
private double CalculateEnd()
{
return (this.Slope*this.xAxisValues.get(this.xAxisValues.size()-1)) + this.Intercept;
}
}
This is the way i calculated the slope:
Source: http://classroom.synonym.com/calculate-trendline-2709.html
class Program
{
public double CalculateTrendlineSlope(List<Point> graph)
{
int n = graph.Count;
double a = 0;
double b = 0;
double bx = 0;
double by = 0;
double c = 0;
double d = 0;
double slope = 0;
foreach (Point point in graph)
{
a += point.x * point.y;
bx = point.x;
by = point.y;
c += Math.Pow(point.x, 2);
d += point.x;
}
a *= n;
b = bx * by;
c *= n;
d = Math.Pow(d, 2);
slope = (a - b) / (c - d);
return slope;
}
}
class Point
{
public double x;
public double y;
}
Here's what I ended up using.
public class DataPoint<T1,T2>
{
public DataPoint(T1 x, T2 y)
{
X = x;
Y = y;
}
[JsonProperty("x")]
public T1 X { get; }
[JsonProperty("y")]
public T2 Y { get; }
}
public class Trendline
{
public Trendline(IEnumerable<DataPoint<long, decimal>> dataPoints)
{
int count = 0;
long sumX = 0;
long sumX2 = 0;
decimal sumY = 0;
decimal sumXY = 0;
foreach (var dataPoint in dataPoints)
{
count++;
sumX += dataPoint.X;
sumX2 += dataPoint.X * dataPoint.X;
sumY += dataPoint.Y;
sumXY += dataPoint.X * dataPoint.Y;
}
Slope = (sumXY - ((sumX * sumY) / count)) / (sumX2 - ((sumX * sumX) / count));
Intercept = (sumY / count) - (Slope * (sumX / count));
}
public decimal Slope { get; private set; }
public decimal Intercept { get; private set; }
public decimal Start { get; private set; }
public decimal End { get; private set; }
public decimal GetYValue(decimal xValue)
{
return Slope * xValue + Intercept;
}
}
My data set is using a Unix timestamp for the x-axis and a decimal for the y. Change those datatypes to fit your need. I do all the sum calculations in one iteration for the best possible performance.
Thank You so much for the solution, I was scratching my head.
Here's how I applied the solution in Excel.
I successfully used the two functions given by MUHD in Excel:
a = (sum(x*y) - sum(x)sum(y)/n) / (sum(x^2) - sum(x)^2/n)
b = sum(y)/n - b(sum(x)/n)
(careful my a and b are the b and a in MUHD's solution).
- Made 4 columns, for example:
NB: my values y values are in B3:B17, so I have n=15;
my x values are 1,2,3,4...15.
1. Column B: Known x's
2. Column C: Known y's
3. Column D: The computed trend line
4. Column E: B values * C values (E3=B3*C3, E4=B4*C4, ..., E17=B17*C17)
5. Column F: x squared values
I then sum the columns B,C and E, the sums go in line 18 for me, so I have B18 as sum of Xs, C18 as sum of Ys, E18 as sum of X*Y, and F18 as sum of squares.
To compute a, enter the followin formula in any cell (F35 for me):
F35=(E18-(B18*C18)/15)/(F18-(B18*B18)/15)
To compute b (in F36 for me):
F36=C18/15-F35*(B18/15)
Column D values, computing the trend line according to the y = ax + b:
D3=$F$35*B3+$F$36, D4=$F$35*B4+$F$36 and so on (until D17 for me).
Select the column datas (C2:D17) to make the graph.
HTH.
If anyone needs the JS code for calculating the trendline of many points on a graph, here's what worked for us in the end:
/**#typedef {{
* x: Number;
* y:Number;
* }} Point
* #param {Point[]} data
* #returns {Function} */
function _getTrendlineEq(data) {
const xySum = data.reduce((acc, item) => {
const xy = item.x * item.y
acc += xy
return acc
}, 0)
const xSum = data.reduce((acc, item) => {
acc += item.x
return acc
}, 0)
const ySum = data.reduce((acc, item) => {
acc += item.y
return acc
}, 0)
const aTop = (data.length * xySum) - (xSum * ySum)
const xSquaredSum = data.reduce((acc, item) => {
const xSquared = item.x * item.x
acc += xSquared
return acc
}, 0)
const aBottom = (data.length * xSquaredSum) - (xSum * xSum)
const a = aTop / aBottom
const bTop = ySum - (a * xSum)
const b = bTop / data.length
return function trendline(x) {
return a * x + b
}
}
It takes an array of (x,y) points and returns the function of a y given a certain x
Have fun :)
Here's a working example in golang. I searched around and found this page and converted this over to what I needed. Hope someone else can find it useful.
// https://classroom.synonym.com/calculate-trendline-2709.html
package main
import (
"fmt"
"math"
)
func main() {
graph := [][]float64{
{1, 3},
{2, 5},
{3, 6.5},
}
n := len(graph)
// get the slope
var a float64
var b float64
var bx float64
var by float64
var c float64
var d float64
var slope float64
for _, point := range graph {
a += point[0] * point[1]
bx += point[0]
by += point[1]
c += math.Pow(point[0], 2)
d += point[0]
}
a *= float64(n) // 97.5
b = bx * by // 87
c *= float64(n) // 42
d = math.Pow(d, 2) // 36
slope = (a - b) / (c - d) // 1.75
// calculating the y-intercept (b) of the Trendline
var e float64
var f float64
e = by // 14.5
f = slope * bx // 10.5
intercept := (e - f) / float64(n) // (14.5 - 10.5) / 3 = 1.3
// output
fmt.Println(slope)
fmt.Println(intercept)
}