I'm writing a small application in C# using MSChart control to do Scatter Plots of sets of X and Y data points. Some of these can be rather large (hundreds of data points).
Wanted to ask if there's a 'standard' algorith for plotting a best-fit line across the points. I'm thinking to divide the X data points to a predefined number of sets, say 10 or 20, and for each set take the average of the corresponding Y values and the middle X value, and so on to create the line. Is this a correct approach?
I've searched existing threads but they all seem to be about achieving the same using existing applications like Matlab.
Thanks,
using a Linear least squares algorithm
public class XYPoint
{
public int X;
public double Y;
}
class Program
{
public static List<XYPoint> GenerateLinearBestFit(List<XYPoint> points, out double a, out double b)
{
int numPoints = points.Count;
double meanX = points.Average(point => point.X);
double meanY = points.Average(point => point.Y);
double sumXSquared = points.Sum(point => point.X * point.X);
double sumXY = points.Sum(point => point.X * point.Y);
a = (sumXY / numPoints - meanX * meanY) / (sumXSquared / numPoints - meanX * meanX);
b = (a * meanX - meanY);
double a1 = a;
double b1 = b;
return points.Select(point => new XYPoint() { X = point.X, Y = a1 * point.X - b1 }).ToList();
}
static void Main(string[] args)
{
List<XYPoint> points = new List<XYPoint>()
{
new XYPoint() {X = 1, Y = 12},
new XYPoint() {X = 2, Y = 16},
new XYPoint() {X = 3, Y = 34},
new XYPoint() {X = 4, Y = 45},
new XYPoint() {X = 5, Y = 47}
};
double a, b;
List<XYPoint> bestFit = GenerateLinearBestFit(points, out a, out b);
Console.WriteLine("y = {0:#.####}x {1:+#.####;-#.####}", a, -b);
for(int index = 0; index < points.Count; index++)
{
Console.WriteLine("X = {0}, Y = {1}, Fit = {2:#.###}", points[index].X, points[index].Y, bestFit[index].Y);
}
}
}
Yes. You will want to use Linear Regression, specifically Simple Linear Regression.
The algorithm is essentially:
assume there exists a line of best fit, y = ax + b
for each of your points, you want to minimise their distance from this line
calculate the distance for each point from the line, and sum the distances (normally we use the square of the distance to more heavily penalise points further from the line)
find the values of a and b that minimise the resulting equation using basic calculus (there should be only one minimum)
The wikipedia page will give you everything you need.
Related
I have a Histogram statistics bar chart with below data.
Count, HistogramBin
0, -1615.25
0, -1056.42
0, -497.48
1, 61.25
1, 620.05
1, 1178.92
0, 1737.76
0, 2296.59
I need to form Gauss curve based on above values. Could anyone guide me how to achieve the same.
I have written a function based on Wikipedia link: https://en.wikipedia.org/wiki/Gaussian_function
Our average is : 340.67
SD: Standard deviation: 488.98001098632812
private DataTable GenerateGaussTable1(DataTable histogramDataTable,
HistogramValueItem histogramValueDBItem)
{
double amplitude = (Average + 3 * Sigma) / 2;
double mean = Average;
double sd = Sigma;
DataTable dt = new DataTable();
dt.Columns.Add("x", typeof(float));
dt.Columns.Add("Y", typeof(float));
foreach (DataRow row in histogramDataTable.Rows)// top provided data
{
double x = Convert.ToDouble(row[1]) / 2;
double var1 = 1 / sd * Math.Sqrt(2 * 3.14);
double var2 = -0.5 * Math.Pow((x - mean)/sd, 2);
double var4= Math.Exp(var2);
double var5 = var1 * var4;
// Y = Amplitude * exp(-0.5 * ((X - Mean) / SD) ^ 2)
double y = var5;
dt.Rows.Add((float)x, (float)y);
}
return dt;
}
Here is my code:
double gauss(double x, double a, double b, double c)
{
var v1 = ( x - b) / (2d * c * c);
var v2 = -v1 * v1 / 2d;
var v3 = a * Math.Exp(v2);
return v3;
}
and:
private void button_Click(object sender, EventArgs e)
{
Series s1 = chart2.Series[0];
s1.ChartType = SeriesChartType.Line;
s1.Name = "Line";
Series s2 = chart2.Series.Add("Spline");
s2.ChartType = SeriesChartType.Spline;
double avg = 1.8;
double amp = 3;
double sd = 0.53;
List<double> xes = new List<double>
{ 0, 0, 0.05, 0.1, 0.4, 0.9, 1.3, 1.6, 2, 2.4, 2.8, 3.2, 4 };
foreach (var x in xes)
{
s1.Points.AddXY(x, gauss(x, amp, avg, sd));
s2.Points.AddXY(x, gauss(x, amp, avg, sd));
}
}
The math was taken from wikipedia
I think your SD is way too large to create a bell curve; try dividing by 10-100..! - Of course your SD actually is very large and so you really won't get a meaningful bell curve for those data..
I've tried your function, but it gives wrong curves,
The gauss function is wrong, why do you use "2d"?
Here the function :
so first, v1 = (x-b). Then v2 = (x-b)² / 2 c²
And finaly v3 = a exp (v2)
double gauss(double x, double a, double b, double c)
{
var v1 = (x - b);
var v2 = (v1 * v1) / (2 * (c*c));
var v3 = a * Math.Exp(-v2);
return v3;
}
After this fix, the curves are much better.
I have code for drawing Bezier curves. Is it posible to modify this code for drawing B-Spline curves?
Here is my code using DeCasteljau algorithm:
private Point getPoint(int r, int i, double t)
{
if (r == 0) return points[i];
Point p1 = getPoint(r - 1, i, t);
Point p2 = getPoint(r - 1, i + 1, t);
return new Point((int)((1 - t) * p1.X + t * p2.X), (int)((1 - t) * p1.Y + t * p2.Y));
}
I found this code for B-Spline curves. It looks similar to my code, but I have XY points and there are only numbers. I don't know how to modify my code. I tried something but it doesn't work.
private double BasisFunction(int k, int i, ParameterCollection u, double t){
if(k==0)
{
if((u[i]<=t) && (t<=u[i+1]))
return 1;
else
return 0;
}
else
{
double memb1, memb2;
if(u[i+k]==u[i])
memb1 = 0;
else
memb1 = ((t-u[i])/(u[i+k]-u[i]))*BasisFunction(k-1, i, u, t);
if(u[i+k+1]==u[i+1])
memb2 = 0;
else
memb2 = ((u[i+k+1]-t)/(u[i+k+1]-u[i+1]))*BasisFunction(k-1, i+1, u, t);
return memb1+memb2;
}
}
Please help.
The function BasisFunction() is for computing the value of B-spline basis function N(n,i)(t), where n is degree and i ranges from 0 to (m-1) with m is the number of control points. So, to use this function, you need to define the following for your B-spline:
degree.
m control points, denoting them as P[i][2] with i=0~(m-1)
knot sequence. This is the input "ParameterCollection" to the BasisFunction. You need to have (m+degree+1) knots in the knot sequence and the knot values need to be monotonically non-decreasing. An example of knot sequence for degree 3 B-spline with 5 control points is [0,0,0,0,u0,1,1,1,1], where u0 is any value between [0,1].
Then you can evaluate any point on the B-spline curve at parameter t by something like:
double point[2]={0.0}; // point on the B-spline curve
for (int ii=0; ii < m; ii++) // loop thru all control points
{
double basisVal = BasisFunction(degree, ii, knotSequence, t);
point[0] += P[ii][0]*basisVal;
point[1] += P[ii][1]*basisVal;
}
I have a IEnumerable<double> data sample. I want to compute the 90% confidence interval for the signal/data. I have MathNET library at my disposal, but I am confused as to how to correctly work with the library. Given my data, the idea is to return two additional data arrays that contain the original signal's confidence intervals
using MathNet.Numerics.Statistics;
using MathNet.Numerics.Distributions;
public static List<double[]> ConfidenceIntervals(IEnumerable<double> sample, double interval)
{
Contract.Requires(interval > 0 && interval < 1.0);
int sampleSize = sample.Count();
double alpha = 1.0 - interval;
double mean = sample.Mean();
double sd = sample.StandardDeviation();
double t, mu;
double[] upper = new double[sampleSize];
double[] lower = new double[sampleSize];
StudentT studentT = new StudentT(mean, alpha, sampleSize - 1);
int index = 0;
foreach (double d in sample)
{
t = studentT.CumulativeDistribution(d);
double tmp = t * (sd / Math.Sqrt(sampleSize));
mu = mean - tmp;
upper[index] = d + mu;
lower[index] = d - mu;
}
return new List<double[]>() { upper, lower };
}
This really is not complex in terms of mathematics, I am just confused as to how to correctly use the functions/methods available to me in the MathNET library.
I'm not entirely sure I understand how the confidence interval of the signal is supposed to be applied to each sample of the signal, but we can compute the confidence interval of the sample set as follows:
public static Tuple<double, double> A(double[] samples, double interval)
{
double theta = (interval + 1.0)/2;
double mean = samples.Mean();
double sd = samples.StandardDeviation();
double T = StudentT.InvCDF(0,1,samples.Length-1,theta);
double t = T * (sd / Math.Sqrt(samples.Length));
return Tuple.Create(mean-t, mean+t);
}
Except that the line where we compute T does not compile because unfortunately there is no StudentT.InvCDF in current Math.NET Numerics yet. But we can still evaluate it numerically as a workaround in the meantime:
var student = new StudentT(0,1,samples.Length-1);
double T = FindRoots.OfFunction(x => student.CumulativeDistribution(x)-theta,-800,800);
For example, with 16 samples and alpha 0.05 we get 2.131 as expected. If there are more than ~60-100 samples, this can also be approximated with the normal distribution:
double T = Nomal.InvCDF(0,1,theta);
So all in all:
public static Tuple<double, double> B(double[] samples, double interval)
{
double theta = (interval + 1.0)/2;
double T = FindRoots.OfFunction(x => StudentT.CDF(0,1,samples.Length-1,x)-theta,-800,800);
double mean = samples.Mean();
double sd = samples.StandardDeviation();
double t = T * (sd / Math.Sqrt(samples.Length));
return Tuple.Create(mean-t, mean+t);
}
This is not the full answer yet as I understand you wanted to somehow apply the confidence interval to each sample, but hopefully it helps on the way to get there.
PS: Using Math.NET Numerics v3.0.0-alpha7
I noticed that you didn't increase the index value in foreach loop. This will make the value at index 0 is replaced by the next calculation (When you try to set upper[index] and lower[index] values).
So I guess this is a reason why you got the incorrect results.
If so, your code should be
using MathNet.Numerics.Statistics;
using MathNet.Numerics.Distributions;
public static List<double[]> ConfidenceIntervals(IEnumerable<double> sample, double interval)
{
Contract.Requires(interval > 0 && interval < 1.0);
int sampleSize = sample.Count();
double alpha = 1.0 - interval;
double mean = sample.Mean();
double sd = sample.StandardDeviation();
double t, mu;
double[] upper = new double[sampleSize];
double[] lower = new double[sampleSize];
StudentT studentT = new StudentT(mean, alpha, sampleSize - 1);
int index = 0;
foreach (double d in sample)
{
t = studentT.CumulativeDistribution(d);
double tmp = t * (sd / Math.Sqrt(sampleSize));
mu = mean - tmp;
upper[index] = d + mu;
lower[index] = d - mu;
index++;
}
return new List<double[]>() { upper, lower };
}
I'm trying to calculate the slope of two data lists. You can easily calculate this in EXCEL using the SLOPE function. =SLOPE(A1:A100, B1:B100). I'm trying to mimic this function in C# WinForm. Here is my code, it can calculate something, but not the correct number that you would get from the Excel function. Please help me find the error here. Thanks so much!
private double Getslope(List<double> ProductGrossExcessReturnOverRFR, List<double> primaryIndexExcessReturnOverRFR, int months, int go_back = 0)
{
double slope = 0;
double sumx = 0, sumy = 0, sumxy = 0, sumx2 = 0;
for (int i = ProductGrossExcessReturnOverRFR.Count - 1 - go_back; i > ProductGrossExcessReturnOverRFR.Count - (1 + months + go_back); i--)
{
sumxy += ProductGrossExcessReturnOverRFR[i] * primaryIndexExcessReturnOverRFR[i];
sumx += ProductGrossExcessReturnOverRFR[i];
sumy += primaryIndexExcessReturnOverRFR[i];
sumx2 += ProductGrossExcessReturnOverRFR[i] * ProductGrossExcessReturnOverRFR[i];
}
return slope = 1 / (((sumxy - sumx * sumy / months) / (sumx2 - sumx * sumx / months)));
}
Test data:
{1.085231224, 2.335034309, 0.346667278} and
{3.185231224,3.705034309 , -0.883332722} should have slope of 0.3373 if you calculate in Excel using =SLOPE function. But my code produces 0.47 somehow...
I think your formula is wrong
According to the Excel documentation the formula for SLOPE is
Note also that the first argument to the function is the the y values.
It's unclear how goback and months apply, but it looks like this might work:
private double Getslope(List<double> ProductGrossExcessReturnOverRFR,
List<double> primaryIndexExcessReturnOverRFR,
int months,
int go_back = 0)
{
// calc # of items to skip
int skip = ProductGrossExcessReturnOverRFR.Count - go_back - months;
// get list of x's and y's
var ys = ProductGrossExcessReturnOverRFR.Skip(skip).Take(months);
var xs = primaryIndexExcessReturnOverRFR.Skip(skip).Take(months);
// "zip" xs and ys to make the sum of products easier
var xys = Enumerable.Zip(xs,ys, (x, y) => new {x = x, y = y});
double xbar = xs.Average();
double ybar = ys.Average();
double slope = xys.Sum(xy => (xy.x - xbar) * (xy.y - ybar)) / xs.Sum(x => (x - xbar)*(x - xbar));
return slope;
}
I want to use a random number generator that creates random numbers in a gaussian range where I can define the median by myself. I already asked a similar question here and now I'm using this code:
class RandomGaussian
{
private static Random random = new Random();
private static bool haveNextNextGaussian;
private static double nextNextGaussian;
public static double gaussianInRange(double from, double mean, double to)
{
if (!(from < mean && mean < to))
throw new ArgumentOutOfRangeException();
int p = Convert.ToInt32(random.NextDouble() * 100);
double retval;
if (p < (mean * Math.Abs(from - to)))
{
double interval1 = (NextGaussian() * (mean - from));
retval = from + (float)(interval1);
}
else
{
double interval2 = (NextGaussian() * (to - mean));
retval = mean + (float)(interval2);
}
while (retval < from || retval > to)
{
if (retval < from)
retval = (from - retval) + from;
if (retval > to)
retval = to - (retval - to);
}
return retval;
}
private static double NextGaussian()
{
if (haveNextNextGaussian)
{
haveNextNextGaussian = false;
return nextNextGaussian;
}
else
{
double v1, v2, s;
do
{
v1 = 2 * random.NextDouble() - 1;
v2 = 2 * random.NextDouble() - 1;
s = v1 * v1 + v2 * v2;
} while (s >= 1 || s == 0);
double multiplier = Math.Sqrt(-2 * Math.Log(s) / s);
nextNextGaussian = v2 * multiplier;
haveNextNextGaussian = true;
return v1 * multiplier;
}
}
}
Then to verify the results I plotted them with gaussianInRange(0, 0.5, 1) for n=100000000
As one can see the median is really at 0.5 but there isn't really a curve visible. So what I'm doing wrong?
EDIT
What i want is something like this where I can set the highest probability by myself by passing a value.
The simplest way to draw normal deviates conditional on them being in a particular range is with rejection sampling:
do {
retval = NextGaussian() * stdev + mean;
} while (retval < from || to < retval);
The same sort of thing is used when you draw coordinates (v1, v2) in a circle in your unconditional normal generator.
Simply folding in values outside the range doesn't produce the same distribution.
Also, if you have a good implementation of the error function and its inverse, you can calculate the values directly using an inverse CDF. The CDF of a normal distribution is
F(retval) = (1 + erf((retval-mean) / (stdev*sqrt(2)))) / 2
The CDF of a censored distribution is
C(retval) = (F(retval) - F(from)) / (F(to) - F(from)), from ≤ x < to
To draw a random number using a CDF, you draw v from a uniform distribution on [0, 1] and solve C(retval) = v. This gives
double v = random.NextDouble();
double t1 = erf((from - mean) / (stdev*sqrt(2)));
t2 = erf((to - mean) / (stdev*sqrt(2)));
double retval = mean + stdev * sqrt(2) * erf_inv(t1*(1-v) + t2*v);
You can precalculate t1 and t2 for specific parameters. The advantage of this approach is that there is no rejection sampling, so you only need a single NextDouble() per draw. If the [from, to] interval is small this will be faster.
However, it sounds like you might want the binomial distribution instead.
I have similar methods in my Graph generator (had to modify it a bit):
Returns a random floating-point number using a generator function with a specific range:
private double NextFunctional(Func<double, double> func, double from, double to, double height, out double x)
{
double halfWidth = (to - from) / 2;
double distance = halfWidth + from;
x = this.rand.NextDouble() * 2 - 1;// -1 .. 1
double y = func(x);
x = halfWidth * x + distance;
y *= height;
return y;
}
Gaussian function:
private double Gauss(double x)
{
// Graph should look better with double-x scale.
x *= 2;
double σ = 1 / Math.Sqrt(2 * Math.PI);
double variance = Math.Pow(σ, 2);
double exp = -0.5 * Math.Pow(x, 2) / variance;
double y = 1 / Math.Sqrt(2 * Math.PI * variance) * Math.Pow(Math.E, exp);
return y;
}
A method that generates a graph using the random numbers:
private void PlotGraph(Graphics g, Pen p, double from, double to, double height)
{
for (int i = 0; i < 1000; i++)
{
double x;
double y = this.NextFunctional(this.Gauss, from, to, height, out x);
this.DrawPoint(g, p, x, y);
}
}
I would rather used a cosine function - it is much faster and pretty close to the gaussian function for your needs:
double x;
double y = this.NextFunctional(a => Math.Cos(a * Math.PI), from, to, height, out x);
The out double x parameter in the NextFunctional() method is there so you can easily test it on your graphs (I use an iterator in my method).