Very large number manipulations in C#

Very large number manipulations in C# - c#

I need to create random numbers between 100,000 and 100,000,000 using Erlang distribution. As it mentioned on the linked page, parameter x could be any number [0, +infinity); hence my range of values is theoretically acceptable.
However, when calculating the probability density function, the returned value is always 0 and that is because raising e to the power of -\lambda * x which with such big x values is always zero.
I need to be able to raise e to such big powers (indeed very small powers (e.g., -40,000)). My application is in .NET4.5 and I checked system.numerics namespace. However, so far I was not able to figure out how to perform this calculation.

For big floating point numbers you could use BigRational. This answer has a nice explanation about it.

The default floating point numbers in .NET don't have the precision that you are looking for, because as you pointed out, the value of such exponents is smaller than the precision of the number (hence the 0 values).
If you want to stick to C#, why not use a wrapper for an arbitrary precision library.
http://gnumpnet.codeplex.com/

Related

Fast lookup suffering from floating point inaccuracies

Suppose I have equally spaced doubles (64 bit floating point numbers) x0,x1,...,xn. Equally spaced means that for all i, x(i+1) - xi is constant; call it w for width.
Given a number y in the range [x0,xn] I want to find the largest i such that xi <= y.
A naive approach would visit each i in turn (O(n)). Marginally better is to use a binary search (O(log n)).
A constant time lookup would be to calculate (y-x0)/w and cast it to an integer. However, this will occasionally give the wrong result due to floating point inaccuracy. E.g. Suppose there are 100 intervals of width 0.01 starting at 0.
(int)(0.29/0.01) = 28 //want 29 here
Can I retain the constant time lookup but ensure that the results are always identical to the binary search? Performing the calculation with decimals rather than doubles for 'w' and 'x0' seems to work here, but will it always work? I could always follow the direct lookup with a comparison with the xs either side, but this seems ugly and inefficient.
To clarify - I am given the xi and the value y as doubles - I cannot change this. But any intermediate calculation performed before returning the integer index can use any datatypes I like. Additionally, I can perform one-off "preparation" work in order to make the runtime calculation faster.
Edit: Apologies - turns out that I didn't check "equally spaced" properly - these numbers are often not "equally spaced" when their difference is calculated using floating point arithmetic.

Do the following
Calculate (int)(0.29/0.01) = 28 //want 29 here
Next, calculate back i * 0.01 for i between 28-1 and 28+1 and pick up the one that is correct.

What do you mean equally spaced? If can make some assumptions about the numbers, for example - that they increase on an interval, you can actually use median selction that is O(1) in the best case and O(log2(N)) in the worst case.

Force a relatively small inaccuracy in floating point number

I need to add a very small value to a floating point value to make it insignificantly different so that it fails an equality test.
To avoid issues with precision, instead of adding a very small number, I have opted to add a relatively small number. Is this a good solution? or is there a reliable way to add an even smaller number?
matrix.m00 += matrix.m00 * 0.0000001f;
matrix.m11 += matrix.m11 * 0.0000001f;
matrix.m22 += matrix.m22 * 0.0000001f;
From reading I have found that the best solution is to use the next representable floating point number. Though in C# the process of doing this either a) requires unmanaged/unsafe code, or b) uses BitConverter which is too slow. So I figured that the above solution would work, but I would like to know if there are any gotchyas.

You can add an ulp to any double (depends on the double); that is the smallest number that you can add or subtract to it that will change its value.
Calculate the unit in the last place (ULP) for doubles
next higher/lower IEEE double precision number
Though, those posts all use BitConverter. I discovered a post that discusses how to add an ulp without unsafe code or BitConverter, though:
http://realtimemadness.blogspot.com/2012/06/nextafter-in-c-without-allocations-of.html

Sure there's a gotcha. If any of these values is 0, then you'll be adding exactly 0, i.e. not modifying the value at all.
Is there any reason why you couldn't use unsafe code to do this?

The minimum number you can add to a floating point number such that a different number is produced is a function of the original number, it's not some constant. Call this function Epsilon(x).
Epsilon(0), i.e. the minimum floating point number you can add to floating point 0 such that a distinguishable number is produced, can be found in the static value Double.Epsilon.
Even using a "large" epsilon like 1 will eventually fail, though. For example, this returns true in C#:
var big = 10000000000000000.0;
Console.WriteLine(big == (big + 1.0));
So unless you are sure that your input is in some fixed range of magnitude (e.g. all close to 0), you can't just fudge it with a single constant.

Bitwise representation of division of floats - how division of floats works

A number can have multiple representations if we use a float, so the results of a division of floats may produce bitwise different floats. But what if the denominator is a power of 2?
AFAIK, dividing by a power of 2 would only shift the exponent, leaving the same mantissa, always producing bitwise identical floats. Is that right?
float a = xxx;
float result = n/1024f; // always the same result?
--- UPDATE ----------------------
Sorry for my lack of knowledge in the IEEE black magic for floating points :) , but I'm talking about those numbers Guvante mentioned: no representation for certain decimal numbers, 'inaccurate' floats. For the rest of this post I'll use 'accurate' and 'inaccurate' considering Guvante's definition of these words.
To simplify, let's say the numerator is always an 'accurate' number. Also, let's divide not by any power of 2, but always for 1024. Additionally, I'm doing the operation the same way every time (same method), so I'm talking about getting the same results in different executions (for the same inputs, sure).
I'm asking all this because I see different numbers coming from the same inputs, so I thought: well if I only use 'accurate' floats as numerators and divide by 1024 I will only shift the exponent, still having an 'accurate' float.
You asked for an example. The real problem is this: I have a simulator producing sometimes 0.02999994 and sometimes 0.03000000 for the same inputs. I thought I could multiply these numbers by 1024, round to get an 'integer' ('accurate' float) that would be the same for those two numbers, and then divide by 1024 to get an 'accurate' rounded float.
I was told (in my other question) that I could convert to decimal, round and cast to float, but I want to know if this way works.

A number can have multiple representations if we use a float
The question appears to be predicated on an incorrect premise; the only number that has multiple representations as a float is zero, which can be represented as either "positive zero" or "negative zero". Other than zero a given number only has one representation as a float, assuming that you are talking about the "double" or "float" types.
Or perhaps I misunderstand. Is the issue that you are referring to the fact that the compiler is permitted to do floating point operations in higher precision than the 32 or 64 bits available for storage? That can cause divisions and multiplications to produce different results in some cases.

Since people often don't fully grasp floating point numbers I will go over some of your points real quick. Each particular combination of bits in a floating point number represent a unique number. However because that number has a base 2 fractional component, there is no representation for certain decimal numbers. For instance 1.1. In those cases you take the closest number. IEEE 754-2008 specifies round to nearest, ties to even in these cases.
The real difficulty is when you combine two of these 'inaccurate' numbers. This can introduce problems as each intermediate step will involve rounding. If you calculate the same value using two different methods, you could come up with subtly different values. Typically this is handled with an epsilon when you want equality.
Now onto your real question, can you divide by a power of two and avoid introducing any additional 'inaccuracies'? Normally you can, however as with all floating point numbers, denormals and other odd cases have their own logic, and obviously if your mantissa overflows you will have difficulty. And again note, that no mathematical errors are introduced during any of this, it is simply math being done with limited percision, which involves intermittent rounding of results.
EDIT: In response to new question
What you are saying could work, but is pretty much equivalent to rounding. Additionally if you are just looking for equality, you should use an episilon as I mentioned earlier (a - b) < e for some small value e (0.0001 would work in your example). If you are looking to print out a pretty number, and the framework you are using isn't doing it to your liking, some rounding would be the most direct way of describing your solution, which is always a plus.

Is it safe to compare two floored doubles using the equality operator?

I know you should never compare floating point value using the == equality operator in .NET, but is it safe to do so if the two numbers were floored using Math.Floor?
I am working with a mapping program, and chunks of the map are stored in different "region" files. I can determine what region to retrieve by dividing the world coordinates by 16 and flooring the result, which gets me region coordinates.
I'm essentially asking whether or not two values that have the same whole number portion (e.g. 4.3 and 4.8) that are floored will be compared as equal using the == operator.

The general issue with floating point comparisons is that they can easily accrue rounding error. When you take a value like 1.2 (which cannot be exactly represented as a decimal) multiply it by 100 and compare it for equality to 120. The recommendation is to always compare the difference like so:
var a = 1.2;
a *= 100;
if (a - 120 < 0.0001)
{
}
The Math.Floor operation, however, always results in an integer value. That is to say that any fractional values will be truncated, and the exact integer value will be left.
So, if your semantics really are to use a floor, you are safe.
However, if you are really trying to round, then use Math.Round() instead.

Well, it depends on what you're trying to do.
That will tell you whether the floored values are equal - but if one input was just a smidge under 2, and one input was just a smidge over 2, then they'll be seen as different, despite the difference between them being potentially tiny.
Is that okay for your scenario? In some cases it will be, in some it won't.

I think your question is predicated on a faulty assumption. It's perfectly safe to compare floating point values using == in .Net. The only odd behavior associated with == and floating point values is that Double.NaN and Single.NaN when compared to themselves with == will return false (as dictated by the floating point specification).
Using Math.Floor doesn't make this situation any better. If any of the special floating point values (NaN, NegativeInfinity, PositiveInfinity) are passed to Math.Floor they are returned unaltered. So the comparison via == will still have the odd behavior (Reference)
The main effect using Math.Floor will have is more floating values will compare equal to each other. For example 7.1 and 7.5 will be equal after a Math.Floor. That's not inherently any better but could be in the context of your application but it's hard to say it will be without more information.. Could you provide some more detail here on why you think == is unsafe?

Why can't c# calculate exact values of mathematical functions

Why can't c# do any exact operations.
Math.Pow(Math.Sqrt(2.0),2) == 2.0000000000000004
I know how doubles work, I know where the rounding error is from, I know that it's almost the correct value, and I know that you can't store infinite numbers in a finite double. But why isn't there a way that c# can calculate it exactly, while my calculator can do it.
Edit
It's not about my calculator, I was just giving an example:
http://www.wolframalpha.com/input/?i=Sqrt%282.000000000000000000000000000000000000000000000000000000000000000000000000000000001%29%5E2
Cheers

Chances are your calculator can't do it exactly - but it's probably storing more information than it's displaying, so the error after squaring ends up outside the bounds of what's displayed. Either that, or its errors happen to cancel out in this case - but that's not the same as getting it exactly right in a deliberate way.
Another option is that the calculator is remembering the operations that resulted in the previous results, and applying algebra to cancel out the operations... that seems pretty unlikely though. .NET certainly won't try to do that - it will calculate the intermediate value (the root of two) and then square it.
If you think you can do any better, I suggest you try writing out the square root of two to (say) 50 decimal places, and then square it exactly. See whether you come out with exactly 2...

Your calculator is not calculating it exactly, it just that the rounding error is so small that it's not displayed.

I believe most calculators use binary-coded decimals, which is the equivalent of C#'s decimal type (and thus is entirely accurate). That is, each byte contains two digits of the number and maths is done via logarithms.

What makes you think your calculator can do it? It's almost certainly displaying less digits than it calculates with and you'd get the 'correct' result if you printed out your 2.0000000000000004 with only five fractional digits (for example).
I think you'll probably find that it can't. When I do the square root of 2 and then multiply that by itself, I get 1.999999998.
The square root of 2 is one of those annoying irrational numbers like PI and therefore can't be represented with normal IEEE754 doubles or even decimal types. To represent it exactly, you need a system capable of symbolic math where the value is stored as "the square root of two" so that subsequent calculations can deliver correct results.

The way calculators round up numbers vary from model to model. My TI Voyage 200 does algebra to simplify equations (among other things) but most calculators will display only a portion of the real value calculated, after applying a round function on the result. For example, you may find the square root of 2 and the calculator would store (let's say) 54 decimals, but will only display 12 rounded decimals. Thus when doing a square root of 2, then do a power of that result by 2 would return the same value since the result is rounded. In any case, unless the calculator can keep an infinite number of decimals, you'll always have a best approximate result from complexe operations.
By the way, try to represent 10.0 in binary and you'll realize that you can't represent it evenly and you'll end up with (something like) 10.00000000000..01

Your calculator has methods which recognize and manipulate irrational input values.
For example: 2^(1/2) is likely not evaluated to a number in the calculator if you do not explicitly tell it to do so (as in the ti89/92).
Additionally, the calculator has logic it can use to manipulate them such as x^(1/2) * y^(1/2) = (x*y)^1/2 where it can then wash, rinse, repeat the method for working with irrational values.
If you were to give c# some method to do this, I suppose it could as well. After all, algebraic solvers such as mathematica are not magical.

It has been mentioned before, but I think what you are looking for is a computer algebra system. Examples of these are Maxima and Mathematica, and they are designed solely to provide exact values to mathematical calculations, something not covered by the CPU.
The mathematical routines in languages like C# are designed for numerical calculations: it is expected that if you are doing calculations as a program you will have simplified it already, or you will only need a numerical result.

2.0000000000000004 and 2. are both represented as 10. in single precision. In your case, using single precision for C# should give the exact answer
For your other example, Wolfram Alpha may use higher precision than machine precision for calculation. This adds a big performance penalty. For instance, in Mathematica, going to higher precision makes calculations about 300 times slower
k = 1000000;
vec1 = RandomReal[1, k];
vec2 = SetPrecision[vec1, 20];
AbsoluteTiming[vec1^2;]
AbsoluteTiming[vec2^2;]
It's 0.01 second vs 3 seconds on my machine
You can see the difference in results using single precision and double precision introduced by doing something like the following in Java
public class Bits {
public static void main(String[] args) {
double a1=2.0;
float a2=(float)2.0;
double b1=Math.pow(Math.sqrt(a1),2);
float b2=(float)Math.pow(Math.sqrt(a2),2);
System.out.println(Long.toBinaryString(Double.doubleToRawLongBits(a1)));
System.out.println(Integer.toBinaryString(Float.floatToRawIntBits(a2)));
System.out.println(Long.toBinaryString(Double.doubleToRawLongBits(b1)));
System.out.println(Integer.toBinaryString(Float.floatToRawIntBits(b2)));
}
}
You can see that single precision result is exact, whereas double precision is off by one bit

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.