I need to do multiple linear regression efficiently. I am trying to use the Math.NET Numerics package but it seems slow - perhaps it is the way I have coded it? For this example I have only simple (1 x value) regression.
I have this snippet:
public class barData
{
public double[] Xs;
public double Mid;
public double Value;
}
public List<barData> B;
var xdata = B.Select(x=>x.Xs[0]).ToArray();
var ydata = B.Select(x => x.Mid).ToArray();
var X = DenseMatrix.CreateFromColumns(new[] { new DenseVector(xdata.Length, 1), new DenseVector(xdata) });
var y = new DenseVector(ydata);
var p = X.QR().Solve(y);
var b = p[0];
var a = p[1];
B[0].Value = (a * (B[0].Xs[0])) + b;
This runs about 20x SLOWER than this pure C#:
double xAvg = 0;
double yAvg = 0;
int n = -1;
for (int x = Length - 1; x >= 0; x--)
{
n++;
xAvg += B[x].Xs[0];
yAvg += B[x].Mid;
}
xAvg = xAvg / B.Count;
yAvg = yAvg / B.Count;
double v1 = 0;
double v2 = 0;
n = -1;
for (int x = Length - 1; x >= 0; x--)
{
n++;
v1 += (B[x].Xs[0] - xAvg) * (B[x].Mid - yAvg);
v2 += (B[x].Xs[0] - xAvg) * (B[x].Xs[0] - xAvg);
}
double a = v1 / v2;
double b = yAvg - a * xAvg;
B[0].Value = (a * B[Length - 1].Xs[0]) + b;
ALSO if Math.NET is the issue, then if anyone knows simple way to alter my pure code for multiple Xs I would be grateful of some help
Using a QR decomposition is a very generic approach that can deliver least squares regression solutions to any function with linear parameters, no matter how complicated it is. It is therefore not surprising that it cannot compete with a very specific straight implementation (on computation time), especially not in the simple case of y:x->a+b*x. Unfortunately Math.NET Numerics does not provide direct regression routines yet you could use instead.
However, there are still a couple things you can try for better speed:
Use thin instead of full QR decompositon, i.e. pass QRMethod.Thin to the QR method
Use our native MKL provider (much faster QR, but no longer purely managed code)
Tweak threading, e.g. try to disable multi-threading completely (Control.ConfigureSingleThread()) or tweak its parameters
If the data set is very large there are also more efficient ways to build the matrix, but that's likely not very relevant beside of the QR (-> perf analysis!).
Related
Given the following code:
public float[] weights;
public void Input(Neuron[] neurons)
{
float output = 0;
for (int i = 0; i < neurons.Length; i++)
output += neurons[i].input * weights[i];
}
Is it possible to perform all the calculations in a single execution? For example that would be 'neurons[0].input * weights[0].value + neurons[1].input * weights[1].value...'
Coming from this topic - How to sum up an array of integers in C#, there is a way for simpler caclulations, but the idea of my code is to iterate over the first array, multiply each element by the element in the same index in the second array and add that to a sum total.
Doing perf profiling, the line where the output is summed is very heavy on I/O and consumes 99% of my processing power. The stack should have enough memory for this, I am not worried about stack overflow, I just want to see it work faster for the moment (even if accuracy is sacrificed).
I think you are looking for AVX in C#
So you can actually calculate several values in one command.
Thats SIMD for CPU cores. Take a look at this
Here an example from the website:
public static int[] SIMDArrayAddition(int[] lhs, int[] rhs)
{
var simdLength = Vector<int>.Count;
var result = new int[lhs.Length];
var i = 0;
for (i = 0; i <= lhs.Length - simdLength; i += simdLength)
{
var va = new Vector<int>(lhs, i);
var vb = new Vector<int>(rhs, i);
(va + vb).CopyTo(result, i);
}
for (; i < lhs.Length; ++i)
{
result[i] = lhs[i] + rhs[i];
}
return result;
}
You can also combine it with the parallelism you already use.
First, I would like to thank everyone involved in this magnificent project, Math.NET saved my life!
I have few questions about the linear and nonlinear regression, I am a civil engineer and when I was working on my Master's degree, I needed to develop a C# application that calculates the Rheological parameters of concrete based on data acquired from a test.
One of the models that describes the rheological behavior of concrete is the "Herschel-Bulkley model" and it has this formula :
y = T + K*x^n
x (the shear-rate), y (shear-stress) are the values obtained from the test, while T,K and N are the parameters I need to determine.
I know that the value of "T" is between 0 and Ymin (Ymin is the smallest data point from the test), so here is what I did:
Since it is nonlinear equation, I had to make it linear, like this :
ln(y-T) = ln(K) + n*ln(x)
creat an array of possible values of T, from 0 to Ymin, and try each value in the equation,
then through linear regression I find the values of K and N,
then calculate the SSD, and store the results in an array,
after I finish all the possible values of T, I see which one had the smallest SSD, and use it to find the optimal K and N .
This method works, but I feel it is not as smart or elegant as it should be, there must be a better way to do it, and I was hoping to find it here, it is also very slow.
here is the code that I used:
public static double HerschelBulkley(double shearRate, double tau0, double k, double n)
{
var t = tau0 + k * Math.Pow(shearRate, n);
return t;
}
public static (double Tau0, double K, double N, double DeltaMin, double RSquared) HerschelBulkleyModel(double[] shear, double[] shearRate, double step = 1000.0)
{
// Calculate the number values from 0.0 to Shear.Min;
var sm = (int) Math.Floor(shear.Min() * step);
// Populate the Array of Tau0 with the values from 0 to sm
var tau0Array = Enumerable.Range(0, sm).Select(t => t / step).ToArray();
var kArray = new double[sm];
var nArray = new double[sm];
var deltaArray = new double[sm];
var rSquaredArray = new double[sm];
var shearRateLn = shearRate.Select(s => Math.Log(s)).ToArray();
for (var i = 0; i < sm; i++)
{
var shearLn = shear.Select(s => Math.Log(s - tau0Array[i])).ToArray();
var param = Fit.Line(shearRateLn, shearLn);
kArray[i] = Math.Exp(param.Item1);
nArray[i] = param.Item2;
var shearHerschel = shearRate.Select(sr => HerschelBulkley(sr, tau0Array[i], kArray[i], nArray[i])).ToArray();
deltaArray[i] = Distance.SSD(shearHerschel, shear);
rSquaredArray[i] = GoodnessOfFit.RSquared(shearHerschel, shear);
}
var deltaMin = deltaArray.Min();
var index = Array.IndexOf(deltaArray, deltaMin);
var tau0 = tau0Array[index];
var k = kArray[index];
var n = nArray[index];
var rSquared = rSquaredArray[index];
return (tau0, k, n, deltaMin, rSquared);
}
With the equation X^Y = Z, how can I write a c# method, to solve for Y?
Does one already exist?
Here are some examples of the data I will have -
2^Y = 8
3^Y = 9
Try this
Y=Math.Log(8) / Math.Log(2)
You're looking for Math.Log.
With that you can do:
x = Math.Log(8) / Math.Log(2)
Also not that there is a Math.Log10 which is the logarithm by base 10 - the outcome is yet the same.
Not the most optimized option but you can iterate through a for-loop of huge amount of numbers, and check one by one iteratively. Just a solution that popped up in my head. Code would look something like:
int base = 2;
int exponent;
int result = 8;
for(int i = -9999; i<= 10000000; i++)
{
exponent = i;
if(Math.Pow(2,exponent) == result)
{
WriteLine($"Y = {exponent}");
}
You can find out how many time Z can be devided by X.
Hope this helps.
while (Z > X)
{
Z = Z / X;
Y++;
}
I have a working FFT, but my question is how do I convert it into an IFFT?
I was told that an IFFT should be just like the FFT that you are using.
so how do I make an ifft from a fft i c#?
I was told there should only be a few changes made to get the ifft.
I tried to do it myself, but I am not getting the same values back that I put in...
so I made an array of values and put it in to the fft and then the ifft and I can not getting the same values I put in...
so I do not think I changed it the right way.
this is the FFT I have:
public Complex[] FFT(Complex[] x )
{
int N2 = x.Length;
Complex[] X = new Complex[N2];
if (N2 == 1)
{
return x;
}
Complex[] odd = new Complex[N2 / 2];
Complex[] even = new Complex[N2 / 2];
Complex[] Y_Odd = new Complex[N2 / 2];
Complex[] Y_Even = new Complex[N2 / 2];
for (int t = 0; t < N2 / 2; t++)
{
even[t] = x[t * 2];
odd[t] = x[(t * 2) + 1];
}
Y_Even = FFT(even);
Y_Odd = FFT(odd);
Complex temp4;
for (int k = 0; k < (N2 / 2); k++)
{
temp4 = Complex1(k, N2);
X[k] = Y_Even[k] + (Y_Odd[k] * temp4);
X[k + (N2 / 2)] = Y_Even[k] - (Y_Odd[k] * temp4);
}
return X;
}
public Complex Complex1(int K, int N3)
{
Complex W = Complex.Pow((Complex.Exp(-1 * Complex.ImaginaryOne * (2.0 * Math.PI / N3))), K);
return W;
}
Depending on the FFT, you may have to scale the entire complex vector (multiply either the input or result vector, not both) by 1/N (the length of the FFT). But this scale factor differs between FFT libraries (some already include a 1/sqrt(N) factor).
Then take the complex conjugate of the input vector, FFT it, and do another complex conjugate to get the IFFT result. This is equivalent to doing an FFT using -i instead of i for the basis vector exponent.
Also, normally, one does not get the same values out of a computed IFFT(FFT()) as went in, as arithmetic rounding adds at least some low level numerical noise to the result.
I need a random number generator that generates various number between n and m, but no with a equal probability. I want to set a value x between n and m where the possibility is the highest:
Is there an easy way to do that using the Random class? The likelihood should have the form of a binominal distribution or something similar (it is not important that its an exact binominal distributon, rough approximations are also ok)
EDIT
Maybe I have to clarify: I'm not looking for a binominal or gaussian distribution but also for something like this:
I want to to define the value x where the highest likelihood should be.
EDIT
Unfortunately the previously accepted answer does not seem to work how i suspected. So I'm still looking for an answer!
You can use the Box-Muller transform to generate a sequence of psuedorandom normally distributed numbers from a sequence of numbers uniformally distributed between 0 and 1.
Java SDK has good implementation Random.nextGaussian (taken from http://download.oracle.com/javase/1.4.2/docs/api/java/util/Random.html#nextGaussian())
I hope it is rather clear how to parse from java source to c#
synchronized public double nextGaussian() {
if (haveNextNextGaussian) {
haveNextNextGaussian = false;
return nextNextGaussian;
} else {
double v1, v2, s;
do {
v1 = 2 * nextDouble() - 1; // between -1.0 and 1.0
v2 = 2 * nextDouble() - 1; // between -1.0 and 1.0
s = v1 * v1 + v2 * v2;
} while (s >= 1 || s == 0);
double multiplier = Math.sqrt(-2 * Math.log(s)/s);
nextNextGaussian = v2 * multiplier;
haveNextNextGaussian = true;
return v1 * multiplier;
}
}
UPDATE: How I've made shift of median:
public static float gaussianInRange(float from, float mean, float to)
{
if( !(from < mean && mean < to) )
throw new IllegalArgumentException(MessageFormat.format("RandomRange.gaussianInRange({0}, {1}, {2})", from, mean, to));
int p = _staticRndGen.nextInt(100);
float retval;
if (p < (mean*Math.abs(from - to)))
{
double interval1 = (_staticRndGen.nextGaussian() * (mean - from));
retval = from + (float) (interval1);
}
else
{
double interval2 = (_staticRndGen.nextGaussian() * (to - mean));
retval = mean + (float) (interval2);
}
while (retval < from || retval > to)
{
if (retval < from)
retval = (from - retval) + from;
if (retval > to)
retval = to - (retval - to);
}
return retval;
}
You need a generator working on a "Normal Distribution". Have a look here:
http://www.csharpcity.com/reusable-code/random-number-generators/
smth relatively simple.
You can generate 2 random numbers:
1st defines how close to x the 2nd random number would be.
You can use any breakpoint/function levels you like.