Double.Epsilon Value - c#

I feel like this should be easy to find somewhere online but I'm having a hard time.
Does anyone know what the c# value is for Double.Epsilon? I'm looking for the exact numerical value.

Here is its declaration:
[__DynamicallyInvokable]
public const double Epsilon = 4.94065645841247E-324;

No, this certainly is not correct.
First: The value of Double.Epsilon can easily be found out by either a small program or by reading the documentation:
4.94065645841247E-324
Second: Don't confuse this value with Machine Epsilon which is usually used in comparisons between two double values. See this question for more details on "Machine Epsilon".

MSDN page:
The value of this constant is 4.94065645841247e-324.

None of these answers is the exact numerical value. The exact value is a power of 2, namely 2^-1074, as this is how IEEE floating point numbers are actually stored in modern computers. All the other answers given are decimal approximations. If you assign that decimal approximation to a double, then it will round off to 2^-1074, so internally, the register or memory location will receive the true "Epsilon" value. So, using the decimal constant to initial a storage location to the minimum floating point value works, but the decimal constant is still not the actual value of this minimum floating point value.
Explanation: The smallest positive value written in IEEE notation is
0.0000000000000000000000000000000000000000000000000001B x 2^-1022.
(The B is for binary base)
That is a 1 shifted 52 bits to the right of the binary point, then shifted another 1022 bits for a total of 1074 bits. The leading sign bit is zero. Sign bit (1 bit) plus coefficient (52 bits) plus exponent (11 bits) give 64 bits of storage.
Note also this is a "denormalized" floating point value, because the exponent is -1022.
See https://en.wikipedia.org/wiki/IEEE_floating_point and search for "1074".
ps. My calculator gives a more precise representation of the value of Double.Epsilon = 2^-1074 as 4.9406564584124654417656879286822e-324.

Related

Why "float.TryParse("51778365".ToString(), out x)" return 51778364 in C#? [duplicate]

Why does the following program print what it prints?
class Program
{
static void Main(string[] args)
{
float f1 = 0.09f*100f;
float f2 = 0.09f*99.999999f;
Console.WriteLine(f1 > f2);
}
}
Output is
false
Floating point only has so many digits of precision. If you're seeing f1 == f2, it is because any difference requires more precision than a 32-bit float can represent.
I recommend reading What Every Computer Scientist Should Read About Floating Point
The main thing is this isn't just .Net: it's a limitation of the underlying system most every language will use to represent a float in memory. The precision only goes so far.
You can also have some fun with relatively simple numbers, when you take into account that it's not even base ten. 0.1 (1/10th), for example, is a repeating decimal when represented in binary, just as 1/3rd is when represented in decimal.
In this particular case, it’s because .09 and .999999 cannot be represented with exact precision in binary (similarly, 1/3 cannot be represented with exact precision in decimal). For example, 0.111111111111111111101111 base 2 is 0.999998986721038818359375 base 10. Adding 1 to the previous binary value, 0.11111111111111111111 base 2 is 0.99999904632568359375 base 10. There isn’t a binary value for exactly 0.999999. Floating point precision is also limited by the space allocated for storing the exponent and the fractional part of the mantissa. Also, like integer types, floating point can overflow its range, although its range is larger than integer ranges.
Running this bit of C++ code in the Xcode debugger,
float myFloat = 0.1;
shows that myFloat gets the value 0.100000001. It is off by 0.000000001. Not a lot, but if the computation has several arithmetic operations, the imprecision can be compounded.
imho a very good explanation of floating point is in Chapter 14 of Introduction to Computer Organization with x86-64 Assembly Language & GNU/Linux by Bob Plantz of California State University at Sonoma (retired) http://bob.cs.sonoma.edu/getting_book.html. The following is based on that chapter.
Floating point is like scientific notation, where a value is stored as a mixed number greater than or equal to 1.0 and less than 2.0 (the mantissa), times another number to some power (the exponent). Floating point uses base 2 rather than base 10, but in the simple model Plantz gives, he uses base 10 for clarity’s sake. Imagine a system where two positions of storage are used for the mantissa, one position is used for the sign of the exponent* (0 representing + and 1 representing -), and one position is used for the exponent. Now add 0.93 and 0.91. The answer is 1.8, not 1.84.
9311 represents 0.93, or 9.3 times 10 to the -1.
9111 represents 0.91, or 9.1 times 10 to the -1.
The exact answer is 1.84, or 1.84 times 10 to the 0, which would be 18400 if we had 5 positions, but, having only four positions, the answer is 1800, or 1.8 times 10 to the zero, or 1.8. Of course, floating point data types can use more than four positions of storage, but the number of positions is still limited.
Not only is precision limited by space, but “an exact representation of fractional values in binary is limited to sums of inverse powers of two.” (Plantz, op. cit.).
0.11100110 (binary) = 0.89843750 (decimal)
0.11100111 (binary) = 0.90234375 (decimal)
There is no exact representation of 0.9 decimal in binary. Even carrying the fraction out more places doesn’t work, as you get into repeating 1100 forever on the right.
Beginning programmers often see floating point arithmetic as more
accurate than integer. It is true that even adding two very large
integers can cause overflow. Multiplication makes it even more likely
that the result will be very large and, thus, overflow. And when used
with two integers, the / operator in C/C++ causes the fractional part
to be lost. However, ... floating point representations have their own
set of inaccuracies. (Plantz, op. cit.)
*In floating point, both the sign of the number and the sign of the exponent are represented.

How can doubles represent higher numbers than decimals if they can't hold as many significant figures?

I may very well have not the proper understanding of significant figures, but the book
C# 6.0 in a Nutshell by Joseph Albahari and Ben Albahari (O’Reilly).
Copyright 2016 Joseph Albahari and Ben Albahari, 978-1-491-92706-9.
provides the table below for comparing double and decimal:
Is it not counter-intuitive that, on the one hand, a double can hold a smaller quantity of significant figures, while on the other it can represent numbers way bigger than decimal, which can hold a higher quantity of significant figures ?
Imagine you were told you can store a value, but were given a limitation: You can only store 10 digits, 0-9 and a negative symbol. You can create the rules to decode the value, so you can store any value.
The first way you store things is simply as the value xxxxxxxxxx, meaning the number 123 is stored as 0000000123. Simple to store and read. This is how an int works.
Now you decide you want to store fractional numbers, so you change the rules a bit. Now you store xxxxxxyyyy, where x is the integer portion and y is the fractional portion. So, 123.98 would be stored as 0001239800. This is roughly how a Decimal value works. You can see the largest value I can store is 9999999999, which translates to 999999.9999. This means I have a hard upper limit on the size of the value, but the number of the significant digits is large at 10.
There is a way to store larger values, and that's to store the x and y components for the formula in xxxxxxyyyy. So, to store 123.98, you need to store 01239800-2, which I can calculate as . This means I can store much bigger numbers by changing 'y', but the number of significant digits is basically fixed at 6. This is basically how a double works.
The answer lies in the way that doubles are encoded. Rather than just being a direct binary representation of a number, they have 3 parts: sign, exponent, and fraction.
The sign is obvious, it controls + or -.
The fraction part is also obvious. It's binary fraction that represents a number in between 0 and 1.
The exponent is where the magic happens. It signifies a scaling factor.
The final float calculation comes out to (-1)^$sign * (1 + $fraction) * 2 ^$exponent
This allows much higher values than a straight decimal number because of the exponent. There's a lot of reading out there on why this works and how to do addition and multiplication with these encoded numbers. Google around for "IEEE floating point format" or whatever topic you need. Hope that helps!
The Range has nothing to do with the precision. Double has a binary representation (base 2). Not all numbers can be represented exactly as we humans know them in the decimal format. Not to mention and accumulated rounding errors of addition and division. A larger range means a greater MAX VALUE and a smaller MIN VALUE than decimal.
Decimal on the other side is (base 10). It has a smaller range (smaller MAX VALUE and larger MIN VALUE). This has nothing to do with precision, since it is not represented using floating binary point representation, it can represent numbers more precisely and though is recommended for human-made numbers and calculations.

How does decimal work?

I looked at decimal in C# but I wasnt 100% sure what it did.
Is it lossy? in C# writing 1.0000000000001f+1.0000000000001f results in 2 when using float (double gets you 2.0000000000002 which is correct) is it possible to add two things with decimal and not get the correct answer?
How many decimal places can I use? I see the MaxValue is 79228162514264337593543950335 but if i subtract 1 how many decimal places can I use?
Are there quirks I should know of? In C# its 128bits, in other language how many bits is it and will it work the same way as C# decimal does? (when adding, dividing, multiplication)
What you're showing isn't decimal - it's float. They're very different types. f is the suffix for float, aka System.Single. m is the suffix for decimal, aka System.Decimal. It's not clear from your question whether you thought this was actually using decimal, or whether you were just using float to demonstrate your fears.
If you use 1.0000000000001m + 1.0000000000001m you'll get exactly the right value. Note that the double version wasn't able to express either of the individual values exactly, by the way.
I have articles on both kinds of floating point in .NET, and you should read them thoroughly, along other resources:
Binary floating point (float/double)
Decimal floating point (decimal)
All floating point types have their limits of course, but in particular you should not expect binary floating point to accurately represent decimal values such as 0.1. It still can't represent anything that isn't exactly representable in 28/29 decimal digits though - so if you divide 1 by 3, you won't get the exact answer of course.
You should also note that the range of decimal is considerably smaller than that of double. So while it can have 28-29 decimal digits of precision, you can't represent truly huge numbers (e.g. 10200) or miniscule numbers (e.g. 10-200).
Decimals in programming are (almost) never 100% accurate. Sometimes it's even better to multiply the decimal value with a very high number and then calculate, but that's only if you're for example sure that the value is always between 0 and 100(so it won't get out of range of the maxvalue)
Floting point is inherently imprecise. Some numbers can't be represented faithfully. Decimal is a large floating point with high precision. If you look on the page at msdn you can see there are "28-29 significant digits." The .net framework classes are language agnostic. they will work the same in every language that uses .net.
edit (in response to Jon Skeet): If you initialize the Decimal class with the numbers above, which are less than 28 digits each after the decimal point, the number will be stored faithfully as long as the binary representation is exact. Since it works in 64-bit format, I assume the 128-bit will handle it perfectly fine. Some numbers, such as 0.1, will never be exactly representable because they are a repeating sequence in binary.

Find min/max of a float/double that has the same internal representation

Refreshing on floating points (also PDF), IEEE-754 and taking part in this discussion on floating point rounding when converting to strings, brought me to tinker: how can I get the maximum and minimum value for a given floating point number whose binary representations are equal.
Disclaimer: for this discussion, I like to stick to 32 bit and 64 bit floating point as described by IEEE-754. I'm not interested in extended floating point (80-bits) or quads (128 bits IEEE-754-2008) or any other standard (IEEE-854).
Background: Computers are bad at representing 0.1 in binary representation. In C#, a float represents this as 3DCCCCCD internally (C# uses round-to-nearest) and a double as 3FB999999999999A. The same bit patterns are used for decimal 0.100000005 (float) and 0.1000000000000000124 (double), but not for 0.1000000000000000144 (double).
For convenience, the following C# code gives these internal representations:
string GetHex(float f)
{
return BitConverter.ToUInt32(BitConverter.GetBytes(f), 0).ToString("X");
}
string GetHex(double d)
{
return BitConverter.ToUInt64(BitConverter.GetBytes(d), 0).ToString("X");
}
// float
Console.WriteLine(GetHex(0.1F));
// double
Console.WriteLine(GetHex(0.1));
In the case of 0.1, there is no lower decimal number that is represented with the same bit pattern, any 0.99...99 will yield a different bit representation (i.e., float for 0.999999937 yields 3F7FFFFF internally).
My question is simple: how can I find the lowest and highest decimal value for a given float (or double) that is internally stored in the same binary representation.
Why: (I know you'll ask) to find the error in rounding in .NET when it converts to a string and when it converts from a string, to find the internal exact value and to understand my own rounding errors better.
My guess is something like: take the mantissa, remove the rest, get its exact value, get one (mantissa-bit) higher, and calculate the mean: anything below that will yield the same bit pattern. My main problem is: how to get the fractional part as integer (bit manipulation it not my strongest asset). Jon Skeet's DoubleConverter class may be helpful.
One way to get at your question is to find the size of an ULP, or Unit in the Last Place, of your floating-point number. Simplifying a little bit, this is the distance between a given floating-point number and the next larger number. Again, simplifying a little bit, given a representable floating-point value x, any decimal string whose value is between (x - 1/2 ulp) and (x + 1/2 ulp) will be rounded to x when converted to a floating-point value.
The trick is that (x +/- 1/2 ulp) is not a representable floating-point number, so actually calculating its value requires that you use a wider floating-point type (if one is available) or an arbitrary width big decimal or similar type to do the computation.
How do you find the size of an ulp? One relatively easy way is roughly what you suggested, written here is C-ish pseudocode because I don't know C#:
float absX = absoluteValue(x);
uint32_t bitPattern = getRepresentationOfFloat(absx);
bitPattern++;
float nextFloatNumber = getFloatFromRepresentation(bitPattern);
float ulpOfX = (nextFloatNumber - absX);
This works because adding one to the bit pattern of x exactly corresponds to adding one ulp to the value of x. No floating-point rounding occurs in the subtraction because the values involved are so close (in particular, there is a theorem of ieee-754 floating-point arithmetic that if two numbers x and y satisfy y/2 <= x <= 2y, then x - y is computed exactly). The only caveats here are:
if x happens to be the largest finite floating point number, this won't work (it will return inf, which is clearly wrong).
if your platform does not correctly support gradual underflow (say an embedded device running in flush-to-zero mode), this won't work for very small values of x.
It sounds like you're not likely to be in either of those situations, so this should work just fine for your purposes.
Now that you know what an ulp of x is, you can find the interval of values that rounds to x. You can compute ulp(x)/2 exactly in floating-point, because floating-point division by 2 is exact (again, barring underflow). Then you need only compute the value of x +/- ulp(x)/2 suitable larger floating-point type (double will work if you're interested in float) or in a Big Decimal type, and you have your interval.
I made a few simplifying assumptions through this explanation. If you need this to really be spelled out exactly, leave a comment and I'll expand on the sections that are a bit fuzzy when I get the chance.
One other note the following statement in your question:
In the case of 0.1, there is no lower
decimal number that is represented
with the same bit pattern
is incorrect. You just happened to be looking at the wrong values (0.999999... instead of 0.099999... -- an easy typo to make).
Python 3.1 just implemented something like this: see the changelog (scroll down a bit), bug report.

Weird outcome when subtracting doubles [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Why is floating point arithmetic in C# imprecise?
I have been dealing with some numbers and C#, and the following line of code results in a different number than one would expect:
double num = (3600.2 - 3600.0);
I expected num to be 0.2, however, it turned out to be 0.1999999999998181. Is there any reason why it is producing a close, but still different decimal?
This is because double is a floating point datatype.
If you want greater accuracy you could switch to using decimal instead.
The literal suffix for decimal is m, so to use decimal arithmetic (and produce a decimal result) you could write your code as
var num = (3600.2m - 3600.0m);
Note that there are disadvantages to using a decimal. It is a 128 bit datatype as opposed to 64 bit which is the size of a double. This makes it more expensive both in terms of memory and processing. It also has a much smaller range than double.
There is a reason.
The reason is, that the way the number is stored in memory, in case of the double data type, doesn't allow for an exact representation of the number 3600.2. It also doesn't allow for an exact representation of the number 0.2.
0.2 has an infinite representation in binary. If You want to store it in memory or processor registers, to perform some calculations, some number close to 0.2 with finite representation is stored instead. It may not be apparent if You run code like this.
double num = (0.2 - 0.0);
This is because in this case, all binary digits available for representing numbers in double data type are used to represent the fractional part of the number (there is only the fractional part) and the precision is higher. If You store the number 3600.2 in an object of type double, some digits are used to represent the integer part - 3600 and there is less digits representing fractional part. The precision is lower and fractional part that is in fact stored in memory differs from 0.2 enough, that it becomes apparent after conversion from double to string
Change your type to decimal:
decimal num = (3600.2m - 3600.0m);
You should also read this.
See Wikipedia
Can't explain it better. I can also suggest reading What Every Computer Scientist Should Know About Floating-Point Arithmetic. Or see related questions on StackOverflow.

Categories