Approximation of n points to the curve with the best fit - c#

I have a list of n points(2D): P1(x0,y0), P2(x1,y1), P3(x2,y2) …
Points satisfy the condition that each point has unique coordinates and also the coordinates of each point xi, yi> 0 and xi,yi are integers.
The task is to write an algorithm which make approximation of these points
to the curve y = | Acos (Bx) | with the best fit (close or equal to 100%)
and so that the coefficients A and B were as simple as possible.
I would like to write a program in C # but the biggest problem for me is to find a suitable algorithm. Has anyone would be able to help me with this?

Taking B as an independent parameter, you can solve the fitting for A using least-squares, and compute the fitting residual.
The residue function is complex, with numerous minima of different value, and an irregular behavior. Anyway, if the Xi are integer, the function is periodic, with a period related to the LCM of the Xi.
The plots below show the fitting residue for B varying from 0 to 2 and from 0 to 10, with the given sample points.

Based on How approximation search works I would try this in C++:
// (global) input data
#define _n 100
double px[_n]; // x input points
double py[_n]; // y input points
// approximation
int ix;
double e;
approx aa,ab;
// min max step recursions ErrorOfSolutionVariable
for (aa.init(-100,+100.0,10.00,3,&e);!aa.done;aa.step())
for (ab.init(-0.1,+ 0.1, 0.01,3,&e);!ab.done;ab.step())
{
for (e=0.0,ix=0;ix<_n;ix++) // test all measured points (e is cumulative error)
{
e+=fabs(fabs(aa.a*cos(ab.a*px[ix]))-py[ix]);
}
}
// here aa.a,ab.a holds the result A,B coefficients
It uses my approx class from the question linked above
you need to set the min,max and step ranges to match your datasets
can increase accuracy by increasing the recursions number
can improve performance if needed by
using not all points for less accurate recursion layers
increasing starting step (but if too big then it can invalidate result)
You should also add a plot of your input points and the output curve to see if you are close to solution. Without more info about the input points it is hard to be more specific. You can change the difference computation e to match any needed approach this is just sum of abs differences (can use least squares or what ever ...)

Related

Get random double (floating point) value from random byte array between 0 and 1 in C#?

Assume I have an array of bytes which are truly random (e.g. captured from an entropy source).
byte[] myTrulyRandomBytes = MyEntropyHardwareEngine.GetBytes(8);
Now, I want to get a random double precision floating point value, but between the values of 0 and positive 1 (like the Random.NextDouble() function performs).
Simply passing an array of 8 random bytes into BitConverter.ToDouble() can yield strange results, but most importantly, the results will almost never be less than 1.
I am fine with bit-manipulation, but the formatting of floating point numbers has always been mysterious to me. I tried many combinations of bits to apply randomness to and always ended up finding the numbers were either just over 1, always VERY close to 0, or very large.
Can someone explain which bits should be made random in a double in order to make it random within the range 0 and 1?
Though working answers have been given, I'll give an other one, that looks worse but isn't:
long asLong = BitConverter.ToInt64(myTrulyRandomBytes, 0);
double number = (double)(asLong & long.MaxValue) / long.MaxValue;
The issue with casting from an ulong to double is that it's not directly supported by hardware, so it compiles to this:
vxorps xmm0,xmm0,xmm0
vcvtsi2sd xmm0,xmm0,rcx ; interpret ulong as long and convert it to double
test rcx,rcx ; add fixup if it was "negative"
jge 000000000000001D
vaddsd xmm0,xmm0,mmword ptr [00000060h]
vdivsd xmm0,xmm0,mmword ptr [00000068h]
Whereas with my suggestion it will compile more nicely:
vxorps xmm0,xmm0,xmm0
vcvtsi2sd xmm0,xmm0,rcx
vdivsd xmm0,xmm0,mmword ptr [00000060h]
Both tested with the x64 JIT in .NET 4, but this applies in general, there just isn't a nice way to convert an ulong to a double.
Don't worry about the bit of entropy being lost: there are only 262 doubles between 0.0 and 1.0 in the first place, and most of the smaller doubles cannot be chosen so the number of possible results is even less.
Note that this as well as the presented ulong examples can result in exactly 1.0 and distribute the values with slightly differing gaps between adjacent results because they don't divide by a power of two. You can change them exclude 1.0 and get a slightly more uniform spacing (but see the first plot below, there is a bunch of different gaps, but this way it is very regular) like this:
long asLong = BitConverter.ToInt64(myTrulyRandomBytes, 0);
double number = (double)(asLong & long.MaxValue) / ((double)long.MaxValue + 1);
As a really nice bonus, you can now change the division to a multiplication (powers of two usually have inverses)
long asLong = BitConverter.ToInt64(myTrulyRandomBytes, 0);
double number = (double)(asLong & long.MaxValue) * 1.08420217248550443400745280086994171142578125E-19;
Same idea for ulong, if you really want to use that.
Since you also seemed interested specifically in how to do it with double-bits trickery, I can show that too.
Because of the whole significand/exponent deal, it can't really be done in a super direct way (just reinterpreting the bits and that's it), mainly because choosing the exponent uniformly spells trouble (with a uniform exponent, the numbers are necessarily clumped preferentially near 0 since most exponents are there).
But if the exponent is fixed, it's easy to make a double that's uniform in that region. That cannot be 0 to 1 because that spans a lot of exponents, but it can be 1 to 2 and then we can subtract 1.
So first mask away the bits that won't be part of the significand:
x &= (1L << 52) - 1;
Put in the exponent (1.0 - 2.0 range, excluding 2)
x |= 0x3ff0000000000000;
Reinterpret and adjust for the offset of 1:
return BitConverter.Int64BitsToDouble(x) - 1;
Should be pretty fast, too. An unfortunate side effect is that this time it really does cost a bit of entropy, because there are only 52 but there could have been 53. This way always leaves the least significant bit zero (the implicit bit steals a bit).
There were some concerns about the distributions, which I will address now.
The approach of choosing a random (u)long and dividing it by the maximum value clearly has a uniformly chosen (u)long, and what happens after that is actually interesting. The result can justifiably be called a uniform distribution, but if you look at it as a discrete distribution (which it actually is) it looks (qualitatively) like this: (all examples for minifloats)
Ignore the "thicker" lines and wider gaps, that's just the histogram being funny. These plots used division by a power of two, so there is no spacing problem in reality, it's only plotted strangely.
Top is what happens when you use too many bits, as happens when dividing a complete (u)long by its max value. This gives the lower floats a better resolution, but lots of different (u)longs get mapped onto the same float in the higher regions. That's not necessarily a bad thing, if you "zoom out" the density is the same everywhere.
The bottom is what happens when the resolution is limited to the worst case (0.5 to 1.0 region) everywhere, which you can do by limiting the number of bits first and then doing the "scale the integer" deal. My second suggesting with the bit hacks does not achieve this, it's limited to half that resolution.
For what it's worth, NextDouble in System.Random scales a non-negative int into the 0.0 .. 1.0 range. The resolution of that is obviously a lot lower than it could be. It also uses an int that cannot be int.MaxValue and therefore scales by approximately 1/(231-1) (cannot be represented by a double, so slightly rounded), so there are actually 33 slightly different gaps between adjacent possible results, though the majority of the gaps is the same distance.
Since int.MaxValue is small compared to what can be brute-forced these days, you can easily generate all possible results of NextDouble and examine them, for example I ran this:
const double scale = 4.6566128752458E-10;
double prev = 0;
Dictionary<long, int> hist = new Dictionary<long, int>();
for (int i = 0; i < int.MaxValue; i++)
{
long bits = BitConverter.DoubleToInt64Bits(i * scale - prev);
if (!hist.ContainsKey(bits))
hist[bits] = 1;
else
hist[bits]++;
prev = i * scale;
if ((i & 0xFFFFFF) == 0)
Console.WriteLine("{0:0.00}%", 100.0 * i / int.MaxValue);
}
This is easier than you think; its all about scaling (also true when going from a 0-1 range to some other range).
Basically, if you know that you have 64 truly random bits (8 bytes) then just do this:
double zeroToOneDouble = (double)(BitConverter.ToUInt64(bytes) / (decimal)ulong.MaxValue);
The trouble with this kind of algorithm comes when your "random" bits aren't actually uniformally random. That's when you need a specialized algorithm, such as a Mersenne Twister.
I don't know wether it's the best solution for this, but it should do the job:
ulong asLong = BitConverter.ToUInt64(myTrulyRandomBytes, 0);
double number = (double)asLong / ulong.MaxValue;
All I'm doing is converting the byte array to a ulong which is then divided by it's max value, so that the result is between 0 and 1.
To make sure the long value is within the range from 0 to 1, you can apply the following mask:
long longValue = BitConverter.ToInt64(myTrulyRandomBytes, 0);
longValue &= 0x3fefffffffffffff;
The resulting value is guaranteed to lay in the range [0, 1).
Remark. The 0x3fefffffffffffff value is very-very close to 1 and will be printed as 1, but it is really a bit less than 1.
If you want to make the generated values greater, you could set a number higher bits of an exponent to 1. For instance:
longValue |= 0x03c00000000000000;
Summarizing: example on dotnetfiddle.
If you care about the quality of the random numbers generated, be very suspicious of the answers that have appeared so far.
Those answers that use Int64BitsToDouble directly will definitely have problems with NaNs and infinities. For example, 0x7ff0000000000001, a perfectly good random bit pattern, converts to NaN (and so do thousands of others).
Those that try to convert to a ulong and then scale, or convert to a double after ensuring that various bit-pattern constraints are met, won't have NaN problems, but they are very likely to have distributional problems. Representable floating point numbers are not distributed uniformly over (0, 1), so any scheme that randomly picks among all representable values will not produce values with the required uniformity.
To be safe, just use ToInt32 and use that int as a seed for Random. (To be extra safe, reject 0.) This won't be as fast as the other schemes, but it will be much safer. A lot of research and effort has gone into making RNGs good in ways that are not immediately obvious.
Simple piece of code to print the bits out for you.
for (double i = 0; i < 1.0; i+=0.05)
{
var doubleToInt64Bits = BitConverter.DoubleToInt64Bits(i);
Console.WriteLine("{0}:\t{1}", i, Convert.ToString(doubleToInt64Bits, 2));
}
0.05: 11111110101001100110011001100110011001100110011001100110011010
0.1: 11111110111001100110011001100110011001100110011001100110011010
0.15: 11111111000011001100110011001100110011001100110011001100110100
0.2: 11111111001001100110011001100110011001100110011001100110011010
0.25: 11111111010000000000000000000000000000000000000000000000000000
0.3: 11111111010011001100110011001100110011001100110011001100110011
0.35: 11111111010110011001100110011001100110011001100110011001100110
0.4: 11111111011001100110011001100110011001100110011001100110011001
0.45: 11111111011100110011001100110011001100110011001100110011001100
0.5: 11111111011111111111111111111111111111111111111111111111111111
0.55: 11111111100001100110011001100110011001100110011001100110011001
0.6: 11111111100011001100110011001100110011001100110011001100110011
0.65: 11111111100100110011001100110011001100110011001100110011001101
0.7: 11111111100110011001100110011001100110011001100110011001100111
0.75: 11111111101000000000000000000000000000000000000000000000000001
0.8: 11111111101001100110011001100110011001100110011001100110011011
0.85: 11111111101011001100110011001100110011001100110011001100110101
0.9: 11111111101100110011001100110011001100110011001100110011001111
0.95: 11111111101110011001100110011001100110011001100110011001101001

Transform 2 int into 1 int of 5 of lenght

This might not belong here so if I need to ask this somewhere else please tell me.
Let's say we have 10032(Will be X) and 154(Will be Y) as the input, what I would need is to get 1 int as the output. That output would also need to be of length of 4 or 5.
With the output and either X or Y know, I need to stop anyone from discovering the formula. This is a scenario where the Y will stay the same but the X will change often.
I am reading on hash but I am unsure of which one would be the best for me. Or if a simple math formula would do the job. In the program we are currently using it in the following way :
X + Y * 2 / 3 and then rounding to the lower number.
This solution would also need a very low amount of collision.
Thanks
For this question, you may have better luck at Cryptography Stack Exchange but here are a few thoughts.
It sounds like you want to map a 5-digit int and 3-digit int to a 4- or 5-digit int with the qualifications that:
a. The producing algorithm is difficult to determine given the input
b. There are few collisions
Given some function F(x,y) there are 100,000,000 combinations of x and y if x and y are between 1 and 5 digits and 1 and 3 respectively.
If F(x,y) produces a 5-digit number there are 100,000 possible solutions .
On average this would mean that each value of F(x,y) has 1,000 combinations of x, y that map to it.
So at best case this means that given x1, y1 and x2, y2 the odds that F(x1,y1)=F(x2,y2) is 1/1000, which for most uses I can think of would be considered too high.
Considering those things, probably the simplest idea would be a basic modular ring over the ints like Oscar mentioned. For your modulo you should pick the greatest prime number with the number of digits you want to keep. For instance if you want a 5 digit result use 99,877. Or if you wanted to avoid collisions and go with 9-digits, you would use 999,999,733. You can use a prime list to look up which prime you use for your modulo.
I assume that a good approach to minimise collisions would be to use modulus 10^6 after whatever operation you perform on both numbers.
The hard part would be the operation between the original ints, but look up theory about hashing and I am sure you can find nice suggestions.
In order to make it truly difficult to reverse, you could perform operations in several stages, each one of them depending on the results of the previous one. Just an idea...
decimal d = (X * Y) - (reverse X * reverse Y);
(When I say reverse 10032 would be 23001)
Then take the first 4 or 5 digits if there are more.
Or you could make a string that would look like this:
10032154 and then use a Hash method and then take the first 4 or 5 digits?
(You could reverse this too so the string is: 45123001)
BTW why do you need to take the 1st 4 or 5 digits?
Reducing the amount of digits will cause the chance of collusion to increase.

Function returning random double with exponential distribution in range (a,b)

I want to generate a random number from a to b. The problem is, the number has to be given with exponential distribution.
Here's my code:
public double getDouble(double low, double high)
{
double r;
(..some stuff..)
r = rand.NextDouble();
if (r == 0) r += 0.00001;
return (1 / -0.9) * Math.Log(1 - r) * (high - low) + low;
}
The problem is that (1 / -0.9) * Math.Log(1 - r) is not between 0 and 1, so the result won't be between a and b. Can someone help? Thanks in advance!
I missunderstood your question in the first answer :) You are already using the inversion sampling.
To map a range into another range, there is a typical mathematical approach:
f(x) = (b-a)(x - min)/(max-min) + a
where
b = upper bound of target
a = lower bound of target
min = lower bound of source
max = upper bound of source
x = the value to map
(this is linear scaling, so the distribution would be preserved)
(You can verify: If you put in min for x, it results in a, if you put in max for x, you'll get b.)
The Problem now: The exponential distribution has a maximum value of inf. So, you cannot use this equation, because it always wold be whatever / inf + 0 - so 0. (Which makes sense mathematically, but ofc. does not fit your needs)
So, the ONLY correct answer is: There is no exponential distribution possible between two fixed numbers, cause you can't map [0,inf] -> [a,b]
Therefore you need some sort of trade-off, to make your result as exponential as possible.
I wrapped my head around different possibilities out of curiosity and I found that you simple can't beat maths on this :P
However, I did some test with Excel and 1.4 Million random records:
I picked a random number as "limit" (10) and rounded the computed result to 1 decimal place. (0, 0.1, 0.2 and so on) This number I used to perform the linear transformation with an maximum of 10, ingoring any result greater than 1.
Out of 1.4 Million computations (generated it 10-20 times), only 7-10 random numbers greater than 1 have been generated:
(Probability density function, After mapping the values: Column 100 := 1, Column 0 := 0)
So:
Map the values to [0,1], using the linear approach mentioned above, assume a maximum of 10 for the transformation.
If you encounter a value > 1 after the transformation - just draw another random number, until the value is < 1.
With only 7-10 occurences out of 1.4 Million tests, this should be close enough, since the re-drawn number will again be pseudo-exponential-distributed.
If you want to build a spaceship, where navigation depends on perfectly exponential distributed numbers between 0 and 1 - don't do it, else you should be good.
(If you want to cheat a bit: If you encounter a number > 1, just find the record that has the biggest variance (i.e. Max(occurrences < expected occurrences)) from it's expected value - then assume that value :P )
Since the support for the exponential distribution is 0 to infinity, regardless of the rate, I'm going to assume that you're asking for an exponential that's truncated below a and above b. Another way of expressing this would be an exponential random variable X conditioned on a <= X <= b.
You can derive the inversion algorithm for this by calculating the cumulative distribution function (CDF) of the truncated distribution as the integral from a to x of the density for your exponential. Scale the result by the area between a and b (which is F(b) - F(a) where F(x) is the CDF of the original exponential distribution) to make it a valid distribution with an area of 1. Set the derived CDF to U, a uniform(0,1) random number, and solve for X to get the inversion.
I don't program C#, but here's the result expressed in Ruby. It should translate pretty transparently.
def exp_in_range(a, b, rate = 1.0)
exp_rate_a = Math.exp(-rate * a)
return -Math.log(exp_rate_a - rand * (exp_rate_a - Math.exp(-rate * b))) / rate
end
I put a default rate of 1.0 since you didn't specify, but clearly you can override that. rand is Ruby's built-in uniform generator. I think the rest is pretty self-explanatory. I cranked out several test sets of 100k observations for a variety of (a,b) values, loaded the results into my favorite stats package, and the results are as expected.
The exponential distribution is not limited on the positive side, so values can go from 0 to inf. There are many ways to scale [0,infinity] to some finite interval, but the result would not be exponential distributed.
If you just want a slice of the exponential distribution between a and b, you could simply draw r from [ra rb] such that -log(1-ra)=a and -log(1-rb)=b , i,e,
r=rand.NextDouble(); // assume this is between 0 and 1
ra=Math.Exp(-a)-1;
rb=Math.Exp(-b)-1;
rbound=ra+(rb-ra)*r;
return -Math.Log(1 - rbound);
Why check for r==0? I think you would want to check for the argument of the log to be >0, so check for r (or rbound int this case) ==1.
Also not clear why the (1/-.9) factor??

Fast lookup suffering from floating point inaccuracies

Suppose I have equally spaced doubles (64 bit floating point numbers) x0,x1,...,xn. Equally spaced means that for all i, x(i+1) - xi is constant; call it w for width.
Given a number y in the range [x0,xn] I want to find the largest i such that xi <= y.
A naive approach would visit each i in turn (O(n)). Marginally better is to use a binary search (O(log n)).
A constant time lookup would be to calculate (y-x0)/w and cast it to an integer. However, this will occasionally give the wrong result due to floating point inaccuracy. E.g. Suppose there are 100 intervals of width 0.01 starting at 0.
(int)(0.29/0.01) = 28 //want 29 here
Can I retain the constant time lookup but ensure that the results are always identical to the binary search? Performing the calculation with decimals rather than doubles for 'w' and 'x0' seems to work here, but will it always work? I could always follow the direct lookup with a comparison with the xs either side, but this seems ugly and inefficient.
To clarify - I am given the xi and the value y as doubles - I cannot change this. But any intermediate calculation performed before returning the integer index can use any datatypes I like. Additionally, I can perform one-off "preparation" work in order to make the runtime calculation faster.
Edit: Apologies - turns out that I didn't check "equally spaced" properly - these numbers are often not "equally spaced" when their difference is calculated using floating point arithmetic.
Do the following
Calculate (int)(0.29/0.01) = 28 //want 29 here
Next, calculate back i * 0.01 for i between 28-1 and 28+1 and pick up the one that is correct.
What do you mean equally spaced? If can make some assumptions about the numbers, for example - that they increase on an interval, you can actually use median selction that is O(1) in the best case and O(log2(N)) in the worst case.

Nondeterministic (random) rounding based on decimal places

I want to do something like this:
return Utils.RandomDouble() < value - Math.Floor(value) : (int)Math.Floor(value) : (int)Math.Ceil(value);
Hard to google it ;) Is there any literature about such kind of rounding mechanism or a name for it?
Just a little background:
We use it for a game where we have health based on integers (hitpoints) but calculate the damages based on doubles to be more exact.
Banker's rounding could be ok, thought it's not random: it just tries to uniformly distribute deviations in case the input is somehwat spread (say a stddev of >1)
However, you're describing
Stochastic Rounding (WikiPedia)
Another unbiased tie-breaking method is stochastic rounding:
If the fractional part of y is .5, choose q randomly among y + 0.5 and y − 0.5, with equal probability.
Like round-half-to-even, this rule is essentially free of overall bias; but it is also fair among even and odd q values. On the other hand, it introduces a random component into the result; performing the same computation twice on the same data may yield two different results. Also, it is open to nonconscious bias if humans (rather than computers or devices of chance) are "randomly" deciding in which direction to round.
Somewhat related: Alternating tie-breaking (can still introduce bias, but lacks the random component)

Categories