how to get the BigInteger to the pow Double in C#? - c#

I tried to use BigInteger.Pow method to calculate something like 10^12345.987654321 but this method only accept integer number as exponent like this:
BigInteger.Pow(BigInteger x, int y)
so how can I use double number as exponent in above method?

There's no arbitrary precision large number support in C#, so this cannot be done directly. There are some alternatives (such as looking for a 3rd party library), or you can try something like the code below - if the base is small enough, like in your case.
public class StackOverflow_11179289
{
public static void Test()
{
int #base = 10;
double exp = 12345.123;
int intExp = (int)Math.Floor(exp);
double fracExp = exp - intExp;
BigInteger temp = BigInteger.Pow(#base, intExp);
double temp2 = Math.Pow(#base, fracExp);
int fractionBitsForDouble = 52;
for (int i = 0; i < fractionBitsForDouble; i++)
{
temp = BigInteger.Divide(temp, 2);
temp2 *= 2;
}
BigInteger result = BigInteger.Multiply(temp, (BigInteger)temp2);
Console.WriteLine(result);
}
}
The idea is to use big integer math to compute the power of the integer part of the exponent, then use double (64-bit floating point) math to compute the power of the fraction part. Then, using the fact that
a ^ (int + frac) = a ^ int * a ^ frac
we can combine the two values into a single big integer. But simply converting the double value to a BigInteger would lose a lot of its precision, so we first "shift" the precision onto the bigInteger (using the loop above, and the fact that the double type uses 52 bits for the precision), then multiplying the result.
Notice that the result is an approximation, if you want a more precise number, you'll need a library that does arbitrary precision floating point math.
Update: If the base / exponent are small enough that the power would be in the range of double, we can simply do what Sebastian Piu suggested (new BigInteger(Math.Pow((double)#base, exp)))

I like carlosfigueira's answer, but of course the result of his method can only be correct on the first (most significant) 15-17 digits, because a System.Double is used as a multiplier eventually.
It is interesting to note that there does exist a method BigInteger.Log that performs the "inverse" operation. So if you want to calculate Pow(7, 123456.78) you could, in theory, search all BigInteger numbers x to find one number such that BigInteger.Log(x, 7) is equal to 123456.78 or closer to 123456.78 than any other x of type BigInteger.
Of course the logarithm function is increasing, so your search can use some kind of "binary search" (bisection search). Our answer lies between Pow(7, 123456) and Pow(7, 123457) which can both be calculated exactly.
Skip the rest if you want
Now, how can we predict in advance if there are more than one integer whose logarithm is 123456.78, up to the precision of System.Double, or if there is in fact no integer whose logarithm hits that specific Double (the precise result of an ideal Pow function being an irrational number)? In our example, there will be very many integers giving the same Double 123456.78 because the factor m = Pow(7, epsilon) (where epsilon is the smallest positive number such that 123456.78 + epilon has a representation as a Double different from the representation of 123456.78 itself) is big enough that there will be very many integers between the true answer and the true answer multiplied by m.
Remember from calculus that the derivative of the mathemtical function x → Pow(7, x) is x → Log(7)*Pow(7, x), so the slope of the graph of the exponential function in question will be Log(7)*Pow(7, 123456.78). This number multiplied by the above epsilon is still much much greater than one, so there are many integers satisfying our need.
Actually, I think carlosfigueira's method will give a "correct" answer x in the sense that Log(x, 7) has the same representation as a Double as 123456.78 has. But has anyone tried it? :-)

I'll provide another answer that is hopefully more clear. The point is: Since the precision of System.Double is limited to approx. 15-17 decimal digits, the result of any Pow(BigInteger, Double) calculation will have an even more limited precision. Therefore, there's no hope of doing better than carlosfigueira's answer does.
Let me illustrate this with an example. Suppose we wanted to calculate
Pow(10, exponent)
where in this example I choose for exponent the double-precision number
const double exponent = 100.0 * Math.PI;
This is of course only an example. The value of exponent, in decimal, can be given as one of
314.159265358979
314.15926535897933
314.1592653589793258106510620564222335815429687500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000...
The first of these numbers is what you normally see (15 digits). The second version is produced with exponent.ToString("R") and contains 17 digits. Note that the precision of Double is less than 17 digits. The third representation above is the theoretical "exact" value of exponent. Note that this differs, of course, from the mathematical number 100π near the 17th digit.
To figure out what Pow(10, exponent) ought to be, I simply did BigInteger.Log10(x) on a lot of numbers x to see how I could reproduce exponent. So the results presented here simply reflect the .NET Framework's implementation of BigInteger.Log10.
It turns out that any BigInteger x from
0x0C3F859904635FC0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
through
0x0C3F85990481FE7FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
makes Log10(x) equal to exponent to the precision of 15 digits. Similarly, any number from
0x0C3F8599047BDEC0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
through
0x0C3F8599047D667FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
satisfies Log10(x) == exponent to the precision of Double. Put in another way, any number from the latter range is equally "correct" as the result of Pow(10, exponent), simply because the precision of exponent is so limited.
(Interlude: The bunches of 0s and Fs reveal that .NET's implementation only considers the most significant bytes of x. They don't care to do better, precisely because the Double type has this limited precision.)
Now, the only reason to introduce third-party software, would be if you insist that exponent is to be interpreted as the third of the decimal numbers given above. (It's really a miracle that the Double type allowed you to specify exactly the number you wanted, huh?) In that case, the result of Pow(10, exponent) would be an irrational (but algebraic) number with a tail of never-repeating decimals. It couldn't fit in an integer without rounding/truncating. PS! If we take the exponent to be the real number 100π, the result, mathematically, would be different: some transcendental number, I suspect.

Related

How to calculate float type precision and does it make sense?

I have a problem understanding the precision of float type.
The msdn writes that precision from 6 to 9 digits. But I note that precision depends from on the size of the number:
float smallNumber = 1.0000001f;
Console.WriteLine(smallNumber); // 1.0000001
bigNumber = 100000001f;
Console.WriteLine(bigNumber); // 100000000
The smallNumber is more precise than big, I understand IEEE754, but I don't understand how MSDN calculate precision, and does it make sense?
Also, you can play with the representation of numbers in float format here. Please write 100000000 value in "You entered" input and click "+1" on the right. Then change the input's value to 1, and click "+1" again. You may see the difference in precision.
The MSDN documentation is nonsensical and wrong.
Bad concept. Binary-floating-point format does not have any precision in decimal digits because it has no decimal digits at all. It represents numbers with a sign, a fixed number of binary digits (bits), and an exponent for a power of two.
Wrong on the high end. The floating-point format represents many numbers exactly, with infinite precision. For example, “3” is represented exactly. You can write it in decimal arbitrarily far, 3.0000000000…, and all of the decimal digits will be correct. Another example is 1.40129846432481707092372958328991613128026194187651577175706828388979108268586060148663818836212158203125•10−45. This number has 105 significant digits in decimal, but the float format represents it exactly (it is 2−149).
Wrong on the low end. When “999999.97” is converted from decimal to float, the result is 1,000,000. So not even one decimal digit is correct.
Not a measure of accuracy. Because the float significand has 24 bits, the resolution of its lowest bit is about 223 times finer than the resolution of its highest bit. This is about 6.9 digits in the sense that log10223 is about 6.9. But that just tells us the resolution—the coarseness—of the representation. When we convert a number to the float format, we get a result that differs from the number by at most ½ of this resolution, because we round to the nearest representable value. So a conversion to float has a relative error of at most 1 part in 224, which corresponds to about 7.2 digits in the above sense. If we are using digits to measure resolution, then we say the resolution is about 7.2 digits, not that it is 6-9 digits.
Where do these numbers came from?
So, if “~6-9 digits” is not a correct concept, does not come from actual bounds on the digits, and does not measure accuracy, where does it come from? We cannot be sure, but 6 and 9 do appear in two descriptions of the float format.
6 is the largest number x for which this is guaranteed:
If any decimal numeral with at most x significant digits is within the normal exponent bounds of the float format and is converted to the nearest value represented in the format, then, when the result is converted to the nearest decimal numeral with at most x significant digits, the result of that conversion equals the original number.
So it is reasonable to say float can preserve at least six decimal digits. However, as we will see, there is no bound involving nine digits.
9 is the smallest number x that guarantees this:
If any finite float number is converted to the nearest decimal numeral with x digits, then, when the result is converted to the nearest value representable in float, the result of that conversion equals the original number.
As an analogy, if float is a container, then the largest “decimal container” guaranteed to fit inside it is six digits, and the smallest “decimal container” guaranteed to hold it is nine digits. 6 and 9 are akin to interior and exterior measurements of the float container.
Suppose you had a block 7.2 units long, and you were looking at its placement on a line of bricks each 1 unit long. If you put the start of the block at the start of a brick, it will extend 7.2 bricks. However, if somebody else chooses where it starts, they might start it in the middle of a brick. Then it would cover part of that brick, all of the next 6 bricks, and and part of the last brick (e.g., .5 + 6 + .7 = 7.2). So a 7.2-unit block is only guaranteed to cover 6 bricks. Conversely, 8 bricks can cover the 7.2-unit block if you choose where they are placed. But if somebody else chooses where they start, the first might cover just .1 units of the block. Then you need 7 more and another fraction, so 9 bricks are needed.
The reason this analogy holds is that powers of two and powers of 10 are irregularly spaced relative to each other. 210 (1024) is near 103 (1000). 10 is the exponent used in the float format for numbers from 1024 (inclusive) to 2048 (exclusive). So this interval from 1024 to 2048 is like a block that has been placed just after the 100-1000 ends and the 1000-10,000 block starts.
But note that this property involving 9 digits is the exterior measurement—it is not a capability that float can perform or a service that it can provide. It is something that float needs (if it is to be held in a decimal format), not something it provides. So it is not a bound on how many digits a float can store.
Further Reading
For better understanding of floating-point arithmetic, consider studying the IEEE-754 Standard for Floating-Point Arithmetic or a good textbook like Handbook of Floating-Point Arithmetic by Jean-Michel Muller et al.
Yes number of digits before rounding errors is a measure of precision but you can not asses precision from just 2 numbers because you might be just closer or further from the rounding threshold.
To better understand the situation then you need to see how floats are represented.
The IEEE754 32bit floats are stored as:
bool(1bit sign) * integer(24bit mantisa) << integer(8bit exponent)
Yes mantissa is 24 bit instead of 23 as it's MSB is implicitly set to 1.
As you can see there are only integers and bitshift. So if you are representing natural number up to 2^24 you are without rounding completely. Fro bigger numbers binary zero padding occurs from the right that causes the difference.
In case of digits after decimal points the zero padding occurs from the left. But there is another problem as in binary you can not store some decadic numbers exactly. For example:
0.3 dec = 0.100110011001100110011001100110011001100... bin
0.25 dec = 0.01 bin
As you can see the sequence of 0.3 dec in binary is infinite (like we can not write 1/3 in decadic) hence if crop it to only 24 bits you lose the rest and the number is not what you want anymore.
If you compare 0.3 and 0.125 the 0.125 is exact and 0.3 is not but 0.125 is much smaller than 0.3. So your measure is not correct unless explored more very close values that will cover the rounding steps and computing the max difference from such set. For example you could compare
1.0000001f
1.0000002f
1.0000003f
1.0000004f
1.0000005f
1.0000006f
1.0000007f
1.0000008f
1.0000009f
and remember the max difference of fabs(x-round(x)) and than do the same for
100000001
100000002
100000003
100000004
100000005
100000006
100000007
100000008
100000009
And then compare the two differences.
On top of all this you are missing one very important thing. And that is the errors while converting from text to binary and back which are usually even bigger. First of all try to print your numbers without rounding (for example force to print 20 decimal digits after decimal point).
Also the numbers are stored in binary base so in order to print them you need to convert to decadic base which involves multiplication and division by 10. The more bits are missing (zero pad) from the number the bigger the print errors are. To be as precise as you can a trick is used and that is to print the number in hex (no rounding errors) and then convert the hex string itself to decadic base on integer math. That is much more accurate then naive floating point prints. for more info see related QAs:
my best attempt to print 32 bit floats with least rounding errors (integer math only)
How do libraries/programming languages convert floats to strings
How do I convert a very long binary number to decimal?
Now to get back to number of "precise" digits represented by float. For integer part of number is that easy:
dec_digits = floor(log10(2^24)) = floor(7.22) = 7
However for digits after decimal point is this not as precise (for first few decadic digits) as there are a lot rounding going on. For more info see:
How do you print the EXACT value of a floating point number?
I think what they mean in their documentation is that depending on the number that the precision ranges from 6 to 9 decimal places. Go by the standard that is explained on the page you linked, sometimes Microsoft are a bit lazy when it comes to documentation, like the rest of us.
The problem with floating point is that it is inaccurate. If you put the number 1.05 into the site in your link you will notice that it cannot be accurately stored in floating point. It's actually stored as 1.0499999523162841796875. It's stored this way to do calculations faster. It's not great for money, e.g. what if your item is priced at $1.05 and you sell a billion of them.
The smallNumber is more precise than big
Incorrect compare. The other number has more significant digits.
1.0000001f is attempting N digits of decimal precision.
100000001f attempts N+1.
I have a problem understanding the precision of float type.
To best understand float precision, think binary. Use "%a" for printing with a C99 or later compiler.
float is stored base 2. The significand is a Dyadic rational, some integer/power-of-2.
float commonly has 24 bits of binary precision. (23-bit explicitly encoded, 1 implied)
Between [1.0 ... 2.0), there are 223 different float values.
Between [2.0 ... 4.0), there are 223 different float values.
Between [4.0 ... 8.0), there are 223 different float values.
...
The possible values of a float are not distributed uniformly among powers-of-10. The grouping of float values to power-of-10 (decimal precision) results in the wobbling 6 to 9 decimal digits of precision.
How to calculate float type precision?
To find the difference between subsequent float values, since C99, use nextafterf()
Illustrative code:
#include<math.h>
#include<stdio.h>
void foooo(float b) {
float a = nextafterf(b, 0);
float c = nextafterf(b, b * 2.0f);
printf("%-15a %.9e\n", a, a);
printf("%-15a %.9e\n", b, b);
printf("%-15a %.9e\n", c, c);
printf("Local decimal precision %.2f digits\n", 1.0 - log10((c - b) / b));
}
int main(void) {
foooo(1.0000001f);
foooo(100000001.0f);
return 0;
}
Output
0x1p+0 1.000000000e+00
0x1.000002p+0 1.000000119e+00
0x1.000004p+0 1.000000238e+00
Local decimal precision 7.92 digits
0x1.7d783ep+26 9.999999200e+07
0x1.7d784p+26 1.000000000e+08
0x1.7d7842p+26 1.000000080e+08
Local decimal precision 8.10 digits

When should I use integer for arithmetic operations instead of float/double

Some SO user said to me that I should not use float/double for stuff like student grades, see the last comments: SequenceEqual() is not equal with custom class and float values
"Because it is often easier and safer to do all arithmetic in integers, and then convert to a suitable display format at the last minute, than it is to attempt to do arithmetic in floating point formats. "
I tried what he said but the result is not satisfying.
int grade1 = 580;
int grade2 = 210;
var average = (grade1 + grade2) / 2;
string result = string.Format("{0:0.0}", average / 100);
result is "3,0"
double grade3 = 5.80d;
double grade4 = 2.10d;
double average1 = (grade3 + grade4) / 2;
double averageFinal = Math.Round(average1);
string result1 = string.Format("{0:0.0}", averageFinal);
result1 is "4,0"
I would expect 4,0 because 3,95 should result in 4,0. That worked because I use Math.Round which again works only on a double or decimal. That would not work on an integer.
So what do I wrong here?
First of all, the specific problem you cite is one that vexes me greatly. You almost never want to do a problem in integer arithmetic and then convert it to a floating point type, because the computation will be done entirely in integers. I wish the C# compiler warned about this one; I see this all the time.
Second, the reason to prefer integer or decimal arithmetic to double arithmetic is that a double can only represent with perfect accuracy a fraction whose denominator is a power of two. When you say 0.1 in a double, you don't get 1/10, because 1/10 is not a fraction whose denominator is any power of two. You get the fraction that is closest to 1/10 that does have a power of two in the denominator.
This usually is "close enough", right up until it isn't. It is particularly nasty when you have tiny errors close to hard cutoffs. You want to say, for instance, that a student must have a 2.4 GPA in order to meet some condition, and the computations you do involving fractions with two in the denominator just happen to work out to 2.39999999999999999956...
Now, you do not necessarily get away from these problems with decimal arithmetic; decimal arithmetic has the same restriction: it can only represent numbers that are fractions with powers of ten in the denominator. You try to represent 1/3, and you're going to get a small, but non-zero error on every computation.
Thus standard advice is: if you are doing any computation where you expect exact arithmetic when making computations that involve fractions that are powers of ten in the numerator, such as financial computations, use decimal, or do the computation entirely in integers, scaled appropriately. If you're doing computations that involve physical quantities, where there is no inherent "base" to the computations, then use "double".
So why use integer over decimal or vice versa? Integer arithmetic can be smaller and faster; decimals take more time and space. But ultimately you should not worry about these small performance differences: pick the data type that most accurately reflects the mathematical domain you are working in, and use it.
You need to "convert to a suitable display format at the last minute":
string result = string.Format("{0:0.0}", average / 100.0);
average is a int, 100 is a int.
When you do average/100, you have an integer divided by an integer, in which you will get back an integer, but since 3.95 is not an integer, it is truncated to 3.
If you want to get a float or a double as your result, a double or float has to be involved in the arithmetic.
In this case, you can cast your result to a double (double)average/100 or divide by a double, average/100.0
The only reason you want to stay away from doing too many arithmetic operations with floats/decimals until the last second is the same reason you don't just plug in values for variables in long physics equations at the start, you lose precision. This is a very important concept in Numerical Methods when you have to deal with floating point representation of numbers, e.g. Machine epsilon.
I think you chose the wrong option of the two given. Knowing Eric, if you had been clear that you were doing floating-point operations like rounding, averaging, etc. then he would not have suggested using integers. The reason to not use double is because they can not always represent a decimal value precisely (you cannot represent 1.1 exactly in a double).
If you want to use floating-point math but still maintain decimal accuracy up to 28 significant digits, then use decimal:
decimal grade1 = 580;
decimal grade2 = 210;
var average = (grade1 + grade2) / 2;
average = Math.Round(average);
string result = string.Format("{0:0.0}", average);
Not that easy.
If the grades are in integer format then store in integer is OK.
But I would convert do decimal PRIOR to performing any math to remove rounding errors from the math unless you specifically want integer math.
For financial calculations decimal is recommended. In fact the suffix is m for money.
decimal (C# Reference)
The decimal keyword indicates a 128-bit data type. Compared to
floating-point types, the decimal type has more precision and a
smaller range, which makes it appropriate for financial and monetary
calculations. The approximate range and precision for the decimal type
are shown in the following table.
Float and Double are floating-point types. You get a bigger range then decimal but less precision. Decimal has a large enough range for grades.
decimal grade3 = 5.80m;
decimal grade4 = 2.10m;
decimal average1 = (grade3 + grade4) / 2;
int grade1 = 580;
int grade2 = 210;
var average = (grade1 + grade2) / 2;
string result = string.Format("{0:0.0}", average / 100); // 3.0
Debug.WriteLine(result);
decimal avg = ((decimal)grade1 + (decimal)grade2) / 200m; // 3.95
Debug.WriteLine(avg);
Debug.WriteLine(string.Format("{0:0.0}", avg)); // 4.0

C# - Incrementing a double value (1.212E+25)

I have a double value that equals 1.212E+25
When I throw it out to text I do myVar.ToString("0000000000000000000000")
The problem is even if I do myVar++ 3 or 4 times the value seems to stay the same.
Why is that?
That is because the precision of a double is not sufficient. It simply can't hold that many significant digits.
It will not fit into a long, but probably into a Decimal.
But... do you really need this level of precision?
To expand on the other answer the smallest increase you can make to a double is one Unit in the Last Place, or ULP, as double is a floating point type then the size of an ULP changes, at 1E+25 it will be about 1E+10.
as you can see compared to 1E+10 incrementing by 1 really might as well be adding nothing. which is exactly what double will do, so it wouldnt matter if you tried it 10^25 times it still won't increase unless you try to increase by at least 1 ULP
if incrementing by an ULP is useful you can do this by casting the bits to long and back here is a quick extension method to do that
public static double UlpChange(this double val, int ulp)
{
if (!double.IsInfinity(val) && !double.IsNaN(val))
{
//should probably do something if we are at max or min values
//but its not clear what
long bits = BitConverter.DoubleToInt64Bits(val);
return BitConverter.Int64BitsToDouble(bits + ulp);
}
return val;
}
double (Double) holds about 16 digits of precision and long (Int64) about 18 digits.
Neither of these appear to have sufficient precision for your needs.
However decimal (Decimal) holds up to 30 digits of precision. Although this appears to be great enough for your needs I'd recommend caution in case your requirement grows even larger. In that case you may need a third party numeric library.
Related StackOverflow entries are:
How can I represent a very large integer in .NET?
Big integers in C#
You may want to read the Floating-Point Guide to understand how doubles work.
Basically, a double only has about 16 decimal digits of precision. At a magnitude of 10^25, an increase of 1.0 is below the threshold of precision and gets lost. Due to the binary representation, this may not be obvious.
The smallest increment that'll work is 2^30+1 which will actually increment the double by 2^31. You can test this kind of thing easily enough with LINQPad:
double inc = 1.0;
double num = 1.212e25;
while(num+inc == num) inc*=2;
inc.Dump(); //2147483648 == 2^31
(num+inc == num).Dump(); //false due to loop invariant
(num+(inc/2.0) == num).Dump();//true due to loop invariant
(num+(inc/2.0+1.0) == num).Dump();//false - 2^30+1 suffices to change the number
(num+(inc/2.0+1.0) == num + inc).Dump();//true - 2^30+1 and 2^31 are equiv. increments
((num+(inc/2.0+1.0)) - num == inc ).Dump();//true - the effective increment is 2^31
Since a double is essentially a binary number with limited precision, that means that the smallest possible increment will itself always be a power of two (this increment can be determined directly from the bit-pattern of the double, but it's probably clearer to do so with a loop as above since that's portable across float, double and other floating point representations (which don't exist in .NET).

Find min/max of a float/double that has the same internal representation

Refreshing on floating points (also PDF), IEEE-754 and taking part in this discussion on floating point rounding when converting to strings, brought me to tinker: how can I get the maximum and minimum value for a given floating point number whose binary representations are equal.
Disclaimer: for this discussion, I like to stick to 32 bit and 64 bit floating point as described by IEEE-754. I'm not interested in extended floating point (80-bits) or quads (128 bits IEEE-754-2008) or any other standard (IEEE-854).
Background: Computers are bad at representing 0.1 in binary representation. In C#, a float represents this as 3DCCCCCD internally (C# uses round-to-nearest) and a double as 3FB999999999999A. The same bit patterns are used for decimal 0.100000005 (float) and 0.1000000000000000124 (double), but not for 0.1000000000000000144 (double).
For convenience, the following C# code gives these internal representations:
string GetHex(float f)
{
return BitConverter.ToUInt32(BitConverter.GetBytes(f), 0).ToString("X");
}
string GetHex(double d)
{
return BitConverter.ToUInt64(BitConverter.GetBytes(d), 0).ToString("X");
}
// float
Console.WriteLine(GetHex(0.1F));
// double
Console.WriteLine(GetHex(0.1));
In the case of 0.1, there is no lower decimal number that is represented with the same bit pattern, any 0.99...99 will yield a different bit representation (i.e., float for 0.999999937 yields 3F7FFFFF internally).
My question is simple: how can I find the lowest and highest decimal value for a given float (or double) that is internally stored in the same binary representation.
Why: (I know you'll ask) to find the error in rounding in .NET when it converts to a string and when it converts from a string, to find the internal exact value and to understand my own rounding errors better.
My guess is something like: take the mantissa, remove the rest, get its exact value, get one (mantissa-bit) higher, and calculate the mean: anything below that will yield the same bit pattern. My main problem is: how to get the fractional part as integer (bit manipulation it not my strongest asset). Jon Skeet's DoubleConverter class may be helpful.
One way to get at your question is to find the size of an ULP, or Unit in the Last Place, of your floating-point number. Simplifying a little bit, this is the distance between a given floating-point number and the next larger number. Again, simplifying a little bit, given a representable floating-point value x, any decimal string whose value is between (x - 1/2 ulp) and (x + 1/2 ulp) will be rounded to x when converted to a floating-point value.
The trick is that (x +/- 1/2 ulp) is not a representable floating-point number, so actually calculating its value requires that you use a wider floating-point type (if one is available) or an arbitrary width big decimal or similar type to do the computation.
How do you find the size of an ulp? One relatively easy way is roughly what you suggested, written here is C-ish pseudocode because I don't know C#:
float absX = absoluteValue(x);
uint32_t bitPattern = getRepresentationOfFloat(absx);
bitPattern++;
float nextFloatNumber = getFloatFromRepresentation(bitPattern);
float ulpOfX = (nextFloatNumber - absX);
This works because adding one to the bit pattern of x exactly corresponds to adding one ulp to the value of x. No floating-point rounding occurs in the subtraction because the values involved are so close (in particular, there is a theorem of ieee-754 floating-point arithmetic that if two numbers x and y satisfy y/2 <= x <= 2y, then x - y is computed exactly). The only caveats here are:
if x happens to be the largest finite floating point number, this won't work (it will return inf, which is clearly wrong).
if your platform does not correctly support gradual underflow (say an embedded device running in flush-to-zero mode), this won't work for very small values of x.
It sounds like you're not likely to be in either of those situations, so this should work just fine for your purposes.
Now that you know what an ulp of x is, you can find the interval of values that rounds to x. You can compute ulp(x)/2 exactly in floating-point, because floating-point division by 2 is exact (again, barring underflow). Then you need only compute the value of x +/- ulp(x)/2 suitable larger floating-point type (double will work if you're interested in float) or in a Big Decimal type, and you have your interval.
I made a few simplifying assumptions through this explanation. If you need this to really be spelled out exactly, leave a comment and I'll expand on the sections that are a bit fuzzy when I get the chance.
One other note the following statement in your question:
In the case of 0.1, there is no lower
decimal number that is represented
with the same bit pattern
is incorrect. You just happened to be looking at the wrong values (0.999999... instead of 0.099999... -- an easy typo to make).
Python 3.1 just implemented something like this: see the changelog (scroll down a bit), bug report.

Is dotNet decimal type vulnerable for the binary comparison error?

One error I stumble upon every few month is this one:
double x = 19.08;
double y = 2.01;
double result = 21.09;
if (x + y == result)
{
MessageBox.Show("x equals y");
}
else
{
MessageBox.Show("that shouldn't happen!"); // <-- this code fires
}
You would suppose the code to display "x equals y" but that's not the case.
The short explanation is that the decimal places are, represented as a binary digit, do not fit into double.
Example:
2.625 would look like:
10.101
because
1-------0-------1---------0----------1
1 * 2 + 0 * 1 + 1 * 0.5 + 0 * 0.25 + 1 * 0,125 = 2.65
And some values (like the result of 19.08 plus 2.01) cannot be be represented with the bits of a double.
One solution is to use a constant:
double x = 19.08;
double y = 2.01;
double result = 21.09;
double EPSILON = 10E-10;
if ( x + y - result < EPSILON )
{
MessageBox.Show("x equals y"); // <-- this code fires
}
else
{
MessageBox.Show("that shouldn't happen!");
}
If I use decimal instead of double in the first example, the result is "x equals y".
But I'm asking myself If this is because of "decimal" type is not vulnerable of this behaviour or it just works in this case because the values "fit" into 128 bit.
Maybe someone has a better solution than using a constant?
Btw. this is not a dotNet/C# problem, it happens in most programming languages I think.
Decimal will be accurate so long as you stay within values which are naturally decimals in an appropriate range. So if you just add and subtract, for example, without doing anything which would skew the range of digits required too much (adding a very very big number to a very very small number) you will end up with easily comparable results. Multiplication is likely to be okay too, but I suspect it's easier to get inaccuracies with it.
As soon as you start dividing, that's where the problems can come - particularly if you start dividing by numbers which include prime factors other than 2 or 5.
Bottom line: it's safe in certain situations, but you really need to have a good handle on exactly what operations you'll be performing.
Note that it's not the 128-bitness of decimal which is helping you here - it's the representation of numbers as floating decimal point values rather than floating binary point values. See my articles on .NET binary floating point and decimal floating point for more information.
System.Decimal is just a floating point number with a different base so, in theory, it is still vulnerable to the sort of error you point out. I think you just happened on a case where rounding doesn't happen. More information here.
Yes, the .NET System.Double structure is subject to the problem you describe.
from http://msdn.microsoft.com/en-us/library/system.double.epsilon.aspx:
Two apparently equivalent floating-point numbers might not compare equal because of differences in their least significant digits. For example, the C# expression, (double)1/3 == (double)0.33333, does not compare equal because the division operation on the left side has maximum precision while the constant on the right side is precise only to the specified digits. If you create a custom algorithm that determines whether two floating-point numbers can be considered equal, you must use a value that is greater than the Epsilon constant to establish the acceptable absolute margin of difference for the two values to be considered equal. (Typically, that margin of difference is many times greater than Epsilon.)

Categories