How to test something like the great circle distance calculation properly? - c#

What is the best way to test something that does not equal exact values, like the great circle calculation:
/// <summary>
/// Get the great circle distance (shortest distance possible) between two points in km.
/// </summary>
/// <param name="endPoint">end point</param>
/// <returns>the great circle distance in km</returns>
public double GreatCircleDistanceInKm(IGeoPoint endPoint)
{
var earthRadius = Constants.EARTH_RADIUS_KM;
var diffLat = Utility.DegreesToRadians(endPoint.Latitude - this.Latitude);
var diffLong = Utility.DegreesToRadians(endPoint.Longitude - this.Longitude);
var a = Math.Sin(diffLat / 2) * Math.Sin(diffLat / 2) +
Math.Cos(Utility.DegreesToRadians(this.Latitude)) * Math.Cos(Utility.DegreesToRadians(endPoint.Latitude)) *
Math.Sin(diffLong / 2) * Math.Sin(diffLong / 2);
var c = 2 * Math.Asin(Math.Min(1, Math.Sqrt(a)));
var d = earthRadius * c;
return d;
}
Currently my test is like this:
[TestMethod]
public void GeoPoint_GreatCircleDistanceInKm_IsCorrect()
{
// arrange
var startPoint = new GeoPoint(0, 45, 90); // id, lat, long
var endPoint1 = new GeoPoint(0, 45, 90);
var endPoint2 = new GeoPoint(0, 0, 0);
// act
var greatCircleDistanceZero = startPoint.GreatCircleDistanceInKm(endPoint1);
var greatCircleDistanceBig = startPoint.GreatCircleDistanceInKm(endPoint2);
// assert
Assert.AreEqual(0, greatCircleDistanceZero);
Assert.AreEqual(10007.543398010288, greatCircleDistanceBig);
}
But this seems wrong, I am finding the answer first then testing against it. How should such methods be tested? Should I go through the algorithm/calculation and try and find out how it works, so I can make it result in exact values?
Clarification:
My question is this the correct method of conducting a test for such things, ie. Should my test bound to the actual implementation (because as you see I am using a fine-grained expected value) or should it be more generic somehow?

You can use Assert.AreEqual(double expected, double actual, double delta). You should use a delta that is sufficiently small (e.g. 0.00000001). Using double.Epsilon isn't recommended here.
From MSDN:
Because Epsilon defines the minimum expression of a positive value whose range is near zero, the margin of difference between two similar values must be greater than Epsilon. Typically, it is many times greater than Epsilon. Because of this, we recommend that you do not use Epsilon when comparing Double values for equality.
To your other questions: Yes, such methods should be tested. But it is not a good idea to write tests after you inspected how the algorithm is implemented, because it may be implemented wrong. You need an overall idea what for example the method should be doing (in your case calculating the greate circle distance). Now you can specify test cases by specifying the input and the expected output. You can retrieve the expected output from another source (e.g. calculating it by hand).
A little side note: In TDD test cases are usually specified before the actual code is written. So there is no algorithm you can go through and find out how it works.

Maybe
Assert.IsTrue(Math.Abs(greatCircleDistanceZero - 0) < Double.Epsilon);
Assert.IsTrue(Math.Abs(greatCircleDistanceBig - 10007.543398010288) < Double.Epsilon);

The following table shows the distance between 2 points on the equator. As the latitude moves from the poles the NS distance is constant but EW distance decreases. As I assume using Great Circle Distance any accuracy greater than 2 decimal places in your results is redundant.
0 decimal places 1.0 = 111.32 km
1 decimal places 0.1 = 11.132 km
2 decimal places 0.01 = 1.1132 km
3 decimal places 0.001 = 111.32 m
4 decimal places 0.0001 = 11.132 m
5 decimal places 0.00001 = 1.1132 m

Related

What parameters should I be using for the LogNormal and Normal Distribution in MATH.NET

I've tried several combinations of Mathdotnet's LogNormal and Normal classes: https://numerics.mathdotnet.com/api/MathNet.Numerics.Distributions/LogNormal.
I seem to get a lot closer to the result I'm looking for using the mean and standard deviation as parameters. However, I notice that when I use larger numbers, like numberOfMinutes my results do not deviate past the mean like they do with smaller numbers like numberOfDays do. I know I'm not thinking about this right and could use some help.
Also, I'd like to use the geometric mean vs the mean but I didn't know what parameter to use for the variance given I couldn't pinpoint how to even use it for the mean.
Finally, I hope the answer to this also answers the same issue I'm having with the Normal distribution.
List<double> numberOfDays = new List<double> { 10, 12, 18, 30 };
double mean = numberOfDays.Mean(); // 17.5
double geometricMean = numberOfDays.GeometricMean(); // 15.954
double variance = numberOfDays.Variance(); // 81
double standardDeviation = numberOfDays.StandardDeviation(); // 9
// Do I need a Geometric Standard Deviation or Variance
double numberOfDaysSampleMV = LogNormal.WithMeanVariance(mean, variance).Sample(); // One example sample yielded 40.23
double numberOfDaysSampleMSD = LogNormal.WithMeanVariance(mean, standardDeviation).Sample(); // One example sample yielded 17.33
I believe you are confused about the parameters required. Using conventional notation, you have set X which you believe is LogNormal:
X = { 10, 12, 18, 30 }
mean: m = 17.5
standard deviation: sd = 9
from this you derive set Y which is Normal:
Y = {2.30,2.48,2.89,3.4}
mean: mu = 2.77
standard deviation: sigma = 0.487
Note that mu and sigma are computed from Y, not X. To create sample of the LogNormal data, you use mu and sigma, not m and sd.
double[] sample = new double[100];
LogNormal.Samples(sample, mu, sigma);
This is consistent with the Wikipedia article on the LogNormal distribution. The Numerics documentation is not clear.
Here is my test program which might be useful:
List<double> X = new List<double> { 10, 12, 18, 30 }; // assume to be LogNormal
double m = X.Mean(); // mean of log normal values = 17.5
double sd = X.StandardDeviation(); // standard deviation of log normal values = 9
List<double> Y = new List<double> { };
for (int i = 0; i < 4; i++)
{
Y.Add(Math.Log(X[i]));
}
// Y = {2.30,2.48,2.89,3.4}
double mu = Y.Mean(); // mean of normal values = 2.77
double sigma = Y.StandardDeviation(); // standard deviation of normal values = 0.487
double[] sample = new double[100];
LogNormal.Samples(sample, mu, sigma); // get sample
double sample_m = sample.Mean(); // 17.93, approximates m
double sample_sd = sample.StandardDeviation(); // 8.98, approximates sd
sample = new double[100];
Normal.Samples(sample, mu, sigma); // get sample
double sample_mu = sample.Mean(); //2.77, approximates mu
double sample_sigma = sample.StandardDeviation(); //0.517 approximates sigma
Using your test program above my samples came out like this.
Using LogNormal(mu, sigma)
I'm ultimately concerned about the values greater than 30 and less than 10.
However, by trail and error [accidentally], when I use the following method to get the samples using the original m and sd variable in your test program I get the results I'm looking for. I do not want to go forward with something I accidentally did.
sample = new double[100];
for (int i = 0; i < 100; i++)
{
sample[i] = LogNormal.WithMeanVariance(m, sd).Sample();
}
Using LogNormal.WithMeanVariance(m, sd)
My values are consistently between the Min and Max and concentrated around the Mean.
My example shows pretty clearly how to get a LogNormal sample that has the mean and standard deviation of the original data.
The min/max of 10/30 is unrealistic if you are going create your samples based on the mean and standard deviation of the sample. Suppose you took of random sample of the weights of 4 people out of a population of 1000 people. Would you expect your sample to include both the lightest and heaviest of the population?
LogNormal.WithMeanVariance(m, sd) is wrong. The units are wrong. It's expecting a variance would have the units of ln(days)^2 while sd has units of days.
I suggest you a) use LogNormal(mu,sigma) and discard any values that are outside your min/max range or b) use LogNormal(mu,c*sigma) for some value of c less than one to reduce the variance enough that all the values are in your min/max range. The choice depends on the nature of your project.
The Wikipedia entry on the LogNormal distribution has formulas for computing mu and sigma from m and sd which might be better than calculating from the Y data.

could not use the full range of double in C#? [duplicate]

I have a unit test, testing boundaries:
[TestMethod]
[ExpectedException(typeof(ArgumentOutOfRangeException))]
public void CreateExtent_InvalidTop_ShouldThrowArgumentOutOfRangeException()
{
var invalidTop = 90.0 + Double.Epsilon;
new Extent(invalidTop, 0.0, 0.0, 0.0);
}
public static readonly double MAX_LAT = 90.0;
public Extent(double top, double right, double bottom, double left)
{
if (top > GeoConstants.MAX_LAT)
throw new ArgumentOutOfRangeException("top"); // not hit
}
I thought I'd just tip the 90.0 over the edge by adding the minimum possible positive double to it, but now the exception is not thrown, any idea why?
When debugging, I see top as coming in as 90, when it should be 90.00000000.... something.
EDIT:
I should have thought a bit harder, 90+Double.Epsilon will lose its resolution. Seems the best way to go is do some bit shifting.
SOLUTION:
[TestMethod]
[ExpectedException(typeof(ArgumentOutOfRangeException))]
public void CreateExtent_InvalidTop_ShouldThrowArgumentOutOfRangeException()
{
var invalidTop = Utility.IncrementTiny(90); // 90.000000000000014
// var sameAsEpsilon = Utility.IncrementTiny(0);
new Extent(invalidTop, 0, 0, 0);
}
/// <summary>
/// Increment a double-precision number by the smallest amount possible
/// </summary>
/// <param name="number">double-precision number</param>
/// <returns>incremented number</returns>
public static double IncrementTiny(double number)
{
#region SANITY CHECKS
if (Double.IsNaN(number) || Double.IsInfinity(number))
throw new ArgumentOutOfRangeException("number");
#endregion
var bits = BitConverter.DoubleToInt64Bits(number);
// if negative then go opposite way
if (number > 0)
return BitConverter.Int64BitsToDouble(bits + 1);
else if (number < 0)
return BitConverter.Int64BitsToDouble(bits - 1);
else
return Double.Epsilon;
}
/// <summary>
/// Decrement a double-precision number by the smallest amount possible
/// </summary>
/// <param name="number">double-precision number</param>
/// <returns>decremented number</returns>
public static double DecrementTiny(double number)
{
#region SANITY CHECKS
if (Double.IsNaN(number) || Double.IsInfinity(number))
throw new ArgumentOutOfRangeException("number");
#endregion
var bits = BitConverter.DoubleToInt64Bits(number);
// if negative then go opposite way
if (number > 0)
return BitConverter.Int64BitsToDouble(bits - 1);
else if (number < 0)
return BitConverter.Int64BitsToDouble(bits + 1);
else
return 0 - Double.Epsilon;
}
This does the job.
Per the documentation of Double.Epsilon:
The value of the Epsilon property reflects the smallest positive
Double value that is significant in numeric operations or comparisons
when the value of the Double instance is zero.
(Emphasis mine.)
Adding it to 90.0 does not produce "the next smallest value after 90.0", this just yields 90.0 again.
Double.Epsilon is the smallest positive representable value. Just because it's representable on its own does not mean it's the smallest value between any other representable value and the next highest one.
Imagine you had a system to represent just integers. You can represent any integer to 5 significant figures, along with a scale (e.g. in the range 1-100).
So these values are exactly representable, for example
12345 (digits=12345, scale = 0)
12345000 (digits=12345, scale = 3)
In that system, the "epsilon" value would be 1... but if you add 1 to 12345000 you'd still end up with 12345000 because the system couldn't represent the exact result of 12345001.
Now apply the same logic to double, with all its intricacies, and you get a much smaller epsilon, but the same general principle: a value which is distinct from zero, but still can end up not making any difference when added to larger numbers.
Note that much larger values have the same property too - for example, if x is a very large double, then x + 1 may well be equal to x because the gap between two "adjacent" doubles becomes more than 2 as the values get big.
In C99 and C++, the function that does what you were trying to do is called nextafter and is in math.h. I do not know if C# has any equivalent, but if it does, I would expect it to have a similar name.
Because Double.Epsilon is the "smallest noticeable change" (loosely speaking) in a double number.
.. but this does not mean that it will have any effect when you use it.
As you know, floats/doubles vary in their resolution depending on the magnitude of the vlue they contain. For example, artificial:
...
-100 -> +-0.1
-10 -> +-0.01
0 -> +-0.001
10 -> +-0.01
100 -> +-0.1
...
If the resolutions were like this, the Epsilon would be 0.001, as it's the smallest possible change. But what would be the expected result of 1000000 + 0.001 in such system?

Why does Math.Tan(90) provide a non-undefined value?

While working with Math.Tan() I found that the result for 90 degree is not undefined. But is, inturn 1.6331779e+16
Here is the screenshot for the app
Here is the code,
// convert to degrees
angle = (Convert.ToDouble(op1) * Math.PI / 180);
// write the output
FinalResult.Text = Math.Tan(Convert.ToDouble(angle)).ToString();
Why is such behaviour, is it expected?
This question has been addressed (and answered, quite thoroughly) over on https://math.stackexchange.com/ in the question Why does the google calculator give tan 90 degrees = 1.6331779e+16?
or in other words ... You need to read David Goldberg's paper, What Every Computer Scientist Should Know About Floating-Point Arithmetic. You can purchase a copy from the ACM (or download one if you are a member of the ACM) at http://dl.acm.org/citation.cfm?id=103163. Other versions are available, gratis, as well.
A copy of the original is at http://perso.ens-lyon.fr/jean-michel.muller/goldberg.pdf. Also:
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
https://ece.uwaterloo.ca/~dwharder/NumericalAnalysis/02Numerics/Double/paper.pdf
And CiteSeer links to other locations as well:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.6768
The calculation is done with floating point numbers which are not perfect (not that Math.PI can ever be perfectly represented as a decimal anyways).
http://en.wikipedia.org/wiki/Floating_point
and more specifically..
https://math.stackexchange.com/questions/536144/why-does-the-google-calculator-give-tan-90-degrees-1-6331779e16
If you want a rounded result then check the input for something like >89.9999 && <90.00001. Don't use == with floating point values.
For explanation of why not to use == with floating point numbers, try running this example:
var d = 0.2;
for(double k = 1.0; k<100; k++) {
var t = 0.2 * k;
t = t / k;
Console.WriteLine("{0} == {1} ==> {2}", d, t, d == t);
}
We're multiplying 0.2 by a whole number, dividing it by the same whole number, and then comparing to 0.2. It should be true every time, right? It's not. Many times it returns false.

How to find the surrounding area 25 miles using latitude & longitude from the current user location

I found Haversine Formula in C# is there any other method better than this.
public double HaversineDistance(LatLng pos1, LatLng pos2, DistanceUnit unit)
{
double R = (unit == DistanceUnit.Miles) ? 3960 : 6371;
var lat = (pos2.Latitude - pos1.Latitude).ToRadians();
var lng = (pos2.Longitude - pos1.Longitude).ToRadians();
var h1 = Math.Sin(lat / 2) * Math.Sin(lat / 2) +
Math.Cos(pos1.Latitude.ToRadians()) * Math.Cos(pos2.Latitude.ToRadians()) *
Math.Sin(lng / 2) * Math.Sin(lng / 2);
var h2 = 2 * Math.Asin(Math.Min(1, Math.Sqrt(h1)));
return R * h2;
}
I suppose it is a matter of what you want to do with it. My guess is that you are trying to calculate distance based on a ZIP (Post) code and you want to know if pos2 is within x distance of pos1.
What you first need to understand is that (unless you have some awesome Geo Spatial data to work with) all calculations do not generally take into account elevation or any other topographical attributes of the given area so your calculations won't be exact. Further these calculations are "as the crow flies" which means point x to point y is a straight line so while point y may lie within 25 miles of central point x it may actually be 30 miles to travel from central point x to point y.
That being said the Haversine Formula is your best bet unless you are calculating small distances (< ~12 miles) in which case you could use Pythagorean's theorem which is expressed as:
d = sqrt((X2 - X1)^2 + (Y2 - Y1)^2)
Where X and Y are your coordinates, obviously. This is much faster but is far less accurate especially as distance increases.
The Haversine Formula is slow especially if you are repeated calling it but I am unaware of any faster methods for calculating distance based on this formula.

Evaluate if two doubles are equal based on a given precision, not within a certain fixed tolerance

I'm running NUnit tests to evaluate some known test data and calculated results. The numbers are floating point doubles so I don't expect them to be exactly equal, but I'm not sure how to treat them as equal for a given precision.
In NUnit we can compare with a fixed tolerance:
double expected = 0.389842845321551d;
double actual = 0.38984284532155145d; // really comes from a data import
Expect(actual, EqualTo(expected).Within(0.000000000000001));
and that works fine for numbers below zero, but as the numbers grow the tolerance really needs to be changed so we always care about the same number of digits of precision.
Specifically, this test fails:
double expected = 1.95346834136148d;
double actual = 1.9534683413614817d; // really comes from a data import
Expect(actual, EqualTo(expected).Within(0.000000000000001));
and of course larger numbers fail with tolerance..
double expected = 1632.4587642911599d;
double actual = 1632.4587642911633d; // really comes from a data import
Expect(actual, EqualTo(expected).Within(0.000000000000001));
What's the correct way to evaluate two floating point numbers are equal with a given precision? Is there a built-in way to do this in NUnit?
From msdn:
By default, a Double value contains 15 decimal digits of precision, although a maximum of 17 digits is maintained internally.
Let's assume 15, then.
So, we could say that we want the tolerance to be to the same degree.
How many precise figures do we have after the decimal point? We need to know the distance of the most significant digit from the decimal point, right? The magnitude. We can get this with a Log10.
Then we need to divide 1 by 10 ^ precision to get a value around the precision we want.
Now, you'll need to do more test cases than I have, but this seems to work:
double expected = 1632.4587642911599d;
double actual = 1632.4587642911633d; // really comes from a data import
// Log10(100) = 2, so to get the manitude we add 1.
int magnitude = 1 + (expected == 0.0 ? -1 : Convert.ToInt32(Math.Floor(Math.Log10(expected))));
int precision = 15 - magnitude ;
double tolerance = 1.0 / Math.Pow(10, precision);
Assert.That(actual, Is.EqualTo(expected).Within(tolerance));
It's late - there could be a gotcha in here. I tested it against your three sets of test data and each passed. Changing pricision to be 16 - magnitude caused the test to fail. Setting it to 14 - magnitude obviously caused it to pass as the tolerance was greater.
This is what I came up with for The Floating-Point Guide (Java code, but should translate easily, and comes with a test suite, which you really really need):
public static boolean nearlyEqual(float a, float b, float epsilon)
{
final float absA = Math.abs(a);
final float absB = Math.abs(b);
final float diff = Math.abs(a - b);
if (a * b == 0) { // a or b or both are zero
// relative error is not meaningful here
return diff < (epsilon * epsilon);
} else { // use relative error
return diff / (absA + absB) < epsilon;
}
}
The really tricky question is what to do when one of the numbers to compare is zero. The best answer may be that such a comparison should always consider the domain meaning of the numbers being compared rather than trying to be universal.
How about converting the items each to string and comparing the strings?
string test1 = String.Format("{0:0.0##}", expected);
string test2 = String.Format("{0:0.0##}", actual);
Assert.AreEqual(test1, test2);
Assert.That(x, Is.EqualTo(y).Within(10).Percent);
is a decent option (changes it to a relative comparison, where x is required to be within 10% of y). You may want to add extra handling for 0, as otherwise you'll get an exact comparison in that case.
Update:
Another good option is
Assert.That(x, Is.EqualTo(y).Within(1).Ulps);
where Ulps means units in the last place. See https://docs.nunit.org/articles/nunit/writing-tests/constraints/EqualConstraint.html#comparing-floating-point-values.
I don't know if there's a built-in way to do it with nunit, but I would suggest multiplying each float by the 10x the precision you're seeking, storing the results as longs, and comparing the two longs to each other.
For example:
double expected = 1632.4587642911599d;
double actual = 1632.4587642911633d;
//for a precision of 4
long lActual = (long) 10000 * actual;
long lExpected = (long) 10000 * expected;
if(lActual == lExpected) { // Do comparison
// Perform desired actions
}
This is a quick idea, but how about shifting them down till they are below zero? Should be something like num/(10^ceil(log10(num))) . . . not to sure about how well it would work, but its an idea.
1632.4587642911599 / (10^ceil(log10(1632.4587642911599))) = 0.16324587642911599
How about:
const double significantFigures = 10;
Assert.AreEqual(Actual / Expected, 1.0, 1.0 / Math.Pow(10, significantFigures));
The difference between the two values should be less than either value divided by the precision.
Assert.Less(Math.Abs(firstValue - secondValue), firstValue / Math.Pow(10, precision));
open FsUnit
actual |> should (equalWithin errorMargin) expected

Categories