Getting lambda in a Box-Cox equation - c#

I have a set of values and I need to get the lambda for a Box-Cox equation. It's a normal curve (gaussian distribution). Does anyone know how to get optimal value for lambda in R, C#, MATLAB, or python or perl?

In R: package geoR, boxcox.fit

boxcox in the MASS package

Scipy has a lot of good curve fitting modules. There is a cookbook recipe for Linear Regression. If you need something more complex, there is an optimize package.

I'm surprised the following packages are not listed in any of answers:
You can use boxcax from scipy.stats in Python or in R you can use Boxcox in Caret package. You can read more about resolving skewness for predictive modeling in this post.

If your data is Gaussian (as you state in your question), then the optimal value of lambda is 1, i.e. it doesn't need to be transformed.

Perl's PDL has a gaussian fit routine. PDL is a lot like Matlab except with the power of programming in Perl.

Related

Find Critical Chi Square Value using MathNet.Numerics

So I want to get Critical Chi-Square Value using Significance level and Degrees of Freedom. I tried using MathNet.Numerics but couldn't find which method to use to get the Critical Chi-Square Value
This was the documentation I'm referring, any help on redirecting me to correct documentation would help.
How I calculate the value in Excel is by using the formula =CHISQ.INV.RT(A2,B2)
The function you require is InvCDF(), it is used as follows:
MathNet.Numerics.Distributions.ChiSquared.InvCDF(degreesOfFreedom, probability);
I could finally solve this problem, so I want to share how I solved it.
I used the MathNet library, and to use the same function of Excel you are providing you have to keep in mind a few things: in this library it does not exist =CHISQ.INV.RT itself, instead, in C#, you need to use InvCDF (the equivalent of =CHISQ.INV in Excel) but instead of using a probability parameter like 0.05, you have to use the opposite part of the interval (0, 1), so that parameter should be 0.95.
The logic of this is in the description of the functions in Excel.
"CHISQ.INV" description says "Returns the inverse of the left-tailed probability of the chi-squared distribution", this one is the equivalent of ChiSquared.InvCDF (C#).
"CHISQ.INV.RT" description says "Returns the inverse of the right-tailed probability of the chi-squared distribution", this one DOES NOT exist in the MathNet library.
Example:
In Excel you write
=CHISQ.INV.RT(0.05, 9)
In C# you write
ChiSquared.InvCDF(9, 0.95);
In both cases the answer will be 16.9189776
Note that the order of the parameters are switched.
I hope I could help with this.

Source code for functions Rand and Randn in Matlab

I'm student from Ariel University in Israel and I'm trying to implement Matlab RAND and RANDN in C# in such way that same input for Matlab and C# (with same seed) , Randn and Rand will give the same result in both languages.
for example:
In Matlab:
rand('seed',123)
disp(rand)
output: 0.0878
In C#:
Console.WriteLine(MyRand(123));
output: 0.0878
I think for implement this kind of functionality, I need to have the source code for RAND and RANDN in Matlab. Does anyone has this code and may share?
Thanks a lot,
Shimon
Doing:
>> s = RandStream.getGlobalStream()
s =
mt19937ar random stream (current global stream)
Seed: 0
NormalTransform: Ziggurat
Your given the random-number-generator algorithm and the transformation used to get normal distributed numbers.
Both are publicly available algorithms.
Google gives you e.g.:
http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/MT2002/emt19937ar.html
and
http://www.jstatsoft.org/v05/i08/paper
describing both algorithms including reference / example implemenations.
Randn is as far as i know MarsenneTwister. To verify this i would first try to use the MarsenneTwister from Apache and check for similar results: http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math3/random/MersenneTwister.html
If so: Search for any implementation. This algorithm should be documented.
But seriously, if you type
edit rand.m
into the Matlab command window, and
edit randn.m
I think you will get as much information as the Mathworks publish about those functions. This information points towards the algorithms used and, for rand an implementation too.
As your question only mentions obtaining the same results, I would recommend one of the following:
Generate a lot of random numbers, then use them one by one in both programming languages.
Implement your own (simple) random generator in both languages.

What Algorithm can i use to find any valid result depending on variable integer inputs

In my project i face a scenario where i have a function with numerous inputs. At a certain point i am provided with an result and i need to find one combination of inputs that generates that result.
Here is some pseudocode that illustrates the problem:
Double y = f(x_0,..., x_n)
I am provided with y and i need to find any combination that fits the input.
I tried several things on paper that could generate something, but my each parameter has a range of 6.5 x 10^9 possible values - so i would like to get an optimal execution time.
Can someone name an algorithm or a topic that will be useful for me so i can read up on how other people solved simmilar problems.
I was thinking along the lines of creating a vector from the inputs and judjing how good that vektor fits the problem. This sounds awful lot like an NN, but there is no training phase available.
Edit:
Thank you all for the feedback. The comments sum up the Problems i have and i will try something along the lines of hill climbing.
The general case for your problem might be impossible to solve, but for some cases there are numerical methods that can help you solve your problem.
For example, in 1D space, if you can find a number that is smaller then y and one that is higher then y - you can use the numerical method regula-falsi in order to numerically find the "root" (which is y in your case, by simply invoking the method onf(x) -y).
Other numerical method to find roots is newton-raphson
I admit, I am not familiar with how to apply these methods on multi dimensional space - but it could be a starter. I'd search the literature for these if I were you.
Note: using such a method almost always requires some knowledge on the function.
Another possible solution is to take g(X) = |f(X) - y)|, and use some heuristical algorithms in order to find a minimal value of g. The problem with heuristical methods is they will get you "close enough" - but seldom will get you exactly to the target (unless the function is convex)
Some optimizations algorithms are: Genethic Algorithm, Hill Climbing, Gradient Descent (where you can numerically find the gradient)

Computing π to "infinite" binary precision in C#

So far it looks like Fabrice Bellard's base 2 equation is the way to go
Ironically this will require a BigReal type; do we have this for .Net? .Net 4.0 has BigInteger.
Anyone have a Haskell version?
Since you're asking for a Haskell version, here is a paper by Jerzy Karczmarczuk, called "The Most Unreliable Technique in the World to compute π":
This paper is an atypical exercice in
lazy functional coding, written for
fun and instruction. It can be read
and understood by anybody who
understands the programming language
Haskell. We show how to implement the
Bailey-Borwein-Ploue formula for π
in a co-recursive, incremental way
which produces the digits 3, 1, 4, 1,
5, 9. . . until the memory
exhaustion. This is not a way to
proceed if somebody needs many
digits! Our coding strategy is
perverse and dangerous, and it
provably breaks down. It is based on
the arithmetics over the domain of
infinite sequences of digits
representing proper fractions expanded
in an integer base. We show how to
manipulate: add, multiply by an
integer, etc. such sequences from the
left to the right ad infinitum,
which obviously cannot work in all
cases because of ambiguities. Some
deep philosophical consequences are
discussed in the conclusions.
It doesn't really solve the problem in an efficient or very practical way, but is entertaining and shows some of the problems with lazy infinite precision arithmetic.
Then there's also this paper by Jeremy Gibbons.
By far my favorite Haskell spigot for pi comes from Jeremy Gibbons:
pi = g(1,0,1,1,3,3) where
g(q,r,t,k,n,l) =
if 4*q+r-t<n*t
then n : g(10*q,10*(r-n*t),t,k,div(10*(3*q+r))t-10*n,l)
else g(q*k,(2*q+r)*l,t*l,k+1,div(q*(7*k+2)+r*l)(t*l),l+2)
The mathematical background that justifies that implementation can be found in:
A Spigot Algorithm for the Digits of Pi
Wikipedia details a lot of ways to get numerical approximations of pi here. They also give some sample pseudo-code
Edit : If you're interested in this kind of mathematical problems without having any related real-world problem to solve (which is definitely a good attitude to have, IMHO), you could visit the Euler Project page
There exists such possibility to process big rational numbers in DLR-based dynamic languages (e.g. IronPython). Or you can use any portable C/C++ implementation of big real numbers through P/Invoke.

Minimization of f(x,y) where x and y are integers

I was wondering if anyone had any suggestions for minimizing a function, f(x,y), where x and y are integers. I have researched lots of minimization and optimization techniques, like BFGS and others out of GSL, and things out of Numerical Recipes. So far, I have tried implenting a couple of different schemes. The first works by picking the direction of largest descent f(x+1,y),f(x-1,y),f(x,y+1),f(x,y-1), and follow that direction with line minimization. I have also tried using a downhill simplex (Nelder-Mead) method. Both methods get stuck far away from a minimum. They both appear to work on simpler functions, like finding the minimum of a paraboloid, but I think that both, and especially the former, are designed for functions where x and y are real-valued (doubles). One more problem is that I need to call f(x,y) as few times as possible. It talks to external hardware, and takes a couple of seconds for each call. Any ideas for this would be greatly appreciated.
Here's an example of the error function. Sorry I didn't post this before. This function takes a couple of seconds to evaluate. Also, the information we query from the device does not add to the error if it is below our desired value, only if it is above
double Error(x,y)
{
SetDeviceParams(x,y);
double a = QueryParamA();
double b = QueryParamB();
double c = QueryParamC();
double _fReturnable = 0;
if(a>=A_desired)
{
_fReturnable+=(A_desired-a)*(A_desired-a);
}
if(b>=B_desired)
{
_fReturnable+=(B_desired-b)*(B_desired-b);
}
if(c>=C_desired)
{
_fReturnable+=(C_desired-c)*(C_desired-c);
}
return Math.sqrt(_fReturnable)
}
There are many, many solutions here. In fact, there are entire books and academic disciplines based on the subject. I am reading an excellent one right now: How to Solve It: Modern Heuristics.
There is no one solution that is correct - different solutions have different advantages based on specific knowledge of your function. It has even been proven that there is no one heuristic that performs the best at all optimization tasks.
If you know that your function is quadratic, you can use Newton-Gauss to find the minimum in one step. A genetic algorithm can be a great general-purpose tool, or you can try simulated annealing, which is less complicated.
Have you looked at genetic algorithms? They are very, very good at finding minimums and maximums, while avoiding local minimum/maximums.
How do you define f(x,y) ? Minimisation is a hard problem, depending on the complexity of your function.
Genetic Algorithms could be a good candidate.
Resources:
Genetic Algorithms in Search, Optimization, and Machine Learning
Implementing a Genetic Algorithms in C#
Simple C# GA
If it's an arbitrary function, there's no neat way of doing this.
Suppose we have a function defined as:
f(x, y) = 0 for x==100, y==100
100 otherwise
How could any algorithm realistically find (100, 100) as the minimum? It could be any possible combination of values.
Do you know anything about the function you're testing?
What you are generally looking for is called an optimisation technique in mathematics. In general, they apply to real-valued functions, but many can be adapted for integral-valued functions.
In particular, I would recommend looking into non-linear programming and gradient descent. Both would seem quite suitable for your application.
If you could perhaps provide any more details, I might be able to suggest somethign a little more specific.
Jon Skeet's answer is correct. You really do need information about f and it's derivatives even if f is everywhere continuous.
The easiest way to appreciate the difficulties of what you ask(minimization of f at integer values only) is just to think about an f: R->R (f is a real valued function of the reals) of one variable that makes large excursions between individual integers. You can easily construct such a function so that there is NO correllation between the local minimums on the real line and the minimums at the integers as well as having no relationship to the first derivative.
For an arbitrary function I see no way except brute force.
So let's look at your problem in math-speak. This is all assuming I understand
your problem fully. Feel free to correct me if I am mistaken.
we want to minimize the following:
\sqrt((a-a_desired)^2 + (b-b_desired)^2 + (c-c_desired)^2)
or in other notation
||Pos(x - x_desired)||_2
where x = (a,b,c) and Pos(y) = max(y, 0) means we want the "positive part"(this accounts
for your if statements). Finally, we wish to restrict ourself
to solutions where x is integer valued.
Unlike the above posters, I don't think genetic algorithms are what you want at all.
In fact, I think the solution is much easier (assuming I am understanding your problem).
1) Run any optimization routine on the function above. THis will give you
the solution x^* = (a^*, b^*,c^*). As this function is increasing with respect
to the variables, the best integer solution you can hope for is
(ceil(a^*),ceil(b^*),ceil(c^*)).
Now you say that your function is possibly hard to evaluate. There exist tools
for this which are not based on heuristics. The go under the name Derivative-Free
Optimization. People use these tools to optimize objective based on simulations (I have
even heard of a case where the objective function is based on crop crowing yields!)
Each of these methods have different properties, but in general they attempt to
minimize not only the objective, but the number of objective function evaluations.
Sorry the formatting was so bad previously. Here's an example of the error function
double Error(x,y)
{
SetDeviceParams(x,y);
double a = QueryParamA();
double b = QueryParamB();
double c = QueryParamC();
double _fReturnable = 0;
if(a>=A_desired)
{
_fReturnable+=(A_desired-a)*(A_desired-a);
}
if(b>=B_desired)
{
_fReturnable+=(B_desired-b)*(B_desired-b);
}
if(c>=C_desired)
{
_fReturnable+=(C_desired-c)*(C_desired-c);
}
return Math.sqrt(_fReturnable)
}

Categories