Math.round bug - what to do? - c#

Math.Round(8.075, 2, MidpointRounding.AwayFromZero) returns 8.07, though it should return 8.08. Mysteriously enough, 7.075 is working fine, but 9.075 also returns 9.07!
What to do? Does anybody know a rounding method without such bugs?

If you count with 10 fingers, like humans do, you don't have any trouble expressing the decimal value 8.075 precisely:
8.075 = 8 x 10^1 + 0 x 10^0 + 7 x 10^-1 + 5 x 10^-2
But computers count with 2 fingers, they need to express that value in powers of 2:
8.075 = 1 x 2^3 + 0 x 2^2 + 0 x 2^1 + 0 x 2^0 + 0 x 2^-1 + 0 x 2^-2 + 0 x 2^-3 +
1 x 2^-4 + 0 x 2^-5 + 0 x 2^-6 + 1 x 2^-7 + 1 x 2^-8 + 0 x 2^-9 + 0 x 2^-10 +
1 x 2^-11 + ...
I gave up with finger cramp typing the terms but the point is that no matter how many powers of 2 you add, you'll never get exactly 8.075m. A similar problem to how humans can never write the result of 10 / 3 precisely, it has an infinite number of digits in the fraction. You can only write the result of that expression accurately when you count with 6 fingers.
A processor of course doesn't have enough storage to store an infinite number of bits to represent a value. So they must truncate the digit sequence, a value of type double can store 53 bits.
As a result, the decimal value 8.075 gets rounded when it is stored in the processor. The sequence of 53 bits, converted back to decimal, is the value ~8.074999999999999289. Which then, as expected, gets rounded to 8.07 by your code.
If you want 10 finger math results, you'll need to use a data type that stores numbers in base 10. That's the System.Decimal type in .NET. Fix:
decimal result = Math.Round(8.075m, 2, MidpointRounding.AwayFromZero)
Note the usage of the letter m in the 8.075m literal in the snippet, a literal of type decimal. Which selects the Math.Round() overload that counts with 10 fingers, previously you used the overload that uses System.Double, the 2 finger version.
Do note that there's a significant disadvantage to calculating with System.Decimal, it is slow. Much, much slower than calculating with System.Double, a value type that's directly supported by the processor. Decimal math is done in software and is not hardware accelerated.

I am not a .net specialist, but these numbers can't be exactly represented as double, so the rounding is accurate if you take into account the real value of those 3 numbers:
7.075 ==> 7.07500000000000017763568394002504646778106689453125
8.075 ==> 8.074999999999999289457264239899814128875732421875
9.075 ==> 9.074999999999999289457264239899814128875732421875
More about floating-point precision: What Every Computer Scientist Should Know About Floating-Point Arithmetic.

The possible solution might be:
(double)Math.Round((decimal)8.075, 2, MidpointRounding.AwayFromZero);

Related

Rounding a number away from zero in c#

In my database, I have three point floating numbers columns and a process that multiplies the two first ones to get the third one value
item.ValorTotal = Math.Round(item.Qtde * item.ValorUnitario, 2, MidpointRounding.AwayFromZero);
but if my item.Qtde is like 0.03 and my item.ValorUnitario is 0.02, item.ValorTotal the result is 0.0006 and the variable receives zero because of the round, how do I can round to get 0.01 and continue using two numbers after decimal point?
In short I do like to round to the first possible number (0.01) when I receive a lower number like 0.006 or 0.0006
Instead of Math.Round() you can use Math.Ceiling().
The Math.Ceiling() method in C# is used to return the smallest integral value greater than or equal to the specified number.
So in your code example it will be something like:
item.ValorTotal = (Math.Ceiling((item.Qtde * item.ValorUnitario) * 100) / 100);
Output:
0,0006 => 0,01
0,0106 => 0,02
AwayFromZero doesn't mean that you always round upward. In real it works at most values like usual rounding. As far as I understood it only has an effect if you would round 0.005x.
Therefore write
item.ValorTotal = Math.Round(item.Qtde * item.ValorUnitario + 0.005, 2, MidpointRounding.AwayFromZero);
when you want to round upward and both values are positive.

How is an integer stored in memory?

This is most probably the dumbest question anyone would ask, but regardless I hope I will find a clear answer for this.
My question is - How is an integer stored in computer memory?
In c# an integer is of size 32 bit. MSDN says we can store numbers from -2,147,483,648 to 2,147,483,647 inside an integer variable.
As per my understanding a bit can store only 2 values i.e 0 & 1. If I can store only 0 or 1 in a bit, how will I be able to store numbers 2 to 9 inside a bit?
More precisely, say I have this code int x = 5; How will this be represented in memory or in other words how is 5 converted into 0's and 1's, and what is the convention behind it?
It's represented in binary (base 2). Read more about number bases. In base 2 you only need 2 different symbols to represent a number. We usually use the symbols 0 and 1. In our usual base we use 10 different symbols to represent all the numbers, 0, 1, 2, ... 8, and 9.
For comparison, think about a number that doesn't fit in our usual system. Like 14. We don't have a symbol for 14, so how to we represent it? Easy, we just combine two of our symbols 1 and 4. 14 in base 10 means 1*10^1 + 4*10^0.
1110 in base 2 (binary) means 1*2^3 + 1*2^2 + 1*2^1 + 0*2^0 = 8 + 4 + 2 + 0 = 14. So despite not having enough symbols in either base to represent 14 with a single symbol, we can still represent it in both bases.
In another commonly used base, base 16, which is also known as hexadecimal, we have enough symbols to represent 14 using only one of them. You'll usually see 14 written using the symbol e in hexadecimal.
For negative integers we use a convenient representation called twos-complement which is the complement (all 1s flipped to 0 and all 0s flipped to 1s) with one added to it.
There are two main reasons this is so convenient:
We know immediately if a number is positive of negative by looking at a single bit, the most significant bit out of the 32 we use.
It's mathematically correct in that x - y = x + -y using regular addition the same way you learnt in grade school. This means that processors don't need to do anything special to implement subtraction if they already have addition. They can simply find the twos-complement of y (recall, flip the bits and add one) and then add x and y using the addition circuit they already have, rather than having a special circuit for subtraction.
This is not a dumb question at all.
Let's start with uint because it's slightly easier. The convention is:
You have 32 bits in a uint. Each bit is assigned a number ranging from 0 to 31. By convention the rightmost bit is 0 and the leftmost bit is 31.
Take each bit number and raise 2 to that power, and then multiply it by the value of the bit. So if bit number three is one, that's 1 x 23. If bit number twelve is zero, that's 0 x 212.
Add up all those numbers. That's the value.
So five would be 00000000000000000000000000000101, because 5 = 1 x 20 + 0 x 21 + 1 x 22 + ... the rest are all zero.
That's a uint. The convention for ints is:
Compute the value as a uint.
If the value is greater than or equal to 0 and strictly less than 231 then you're done. The int and uint values are the same.
Otherwise, subtract 232 from the uint value and that's the int value.
This might seem like an odd convention. We use it because it turns out that it is easy to build chips that perform arithmetic in this format extremely quickly.
Binary works as follows (as your 32 bits).
1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1
2^ 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16......................................0
x
x = sign bit (if 1 then negative number if 0 then positive)
So the highest number is 0111111111............1 (all ones except the negative bit), which is 2^30 + 2 ^29 + 2^28 +........+2^1 + 2^0 or 2,147,483,647.
The lowest is 1000000.........0, meaning -2^31 or -2147483648.
Is this what high level languages lead to!? Eeek!
As other people have said it's a base 2 counting system. Humans are naturally base 10 counters mostly, though time for some reason is base 60, and 6 x 9 = 42 in base 13. Alan Turing was apparently adept at base 17 mental arithmetic.
Computers operate in base 2 because it's easy for the electronics to be either on or off - representing 1 and 0 which is all you need for base 2. You could build the electronics in such a way that it was on, off or somewhere in between. That'd be 3 states, allowing you to do tertiary math (as opposed to binary math). However the reliability is reduced because it's harder to tell the difference between those three states, and the electronics is much more complicated. Even more levels leads to worse reliability.
Despite that it is done in multi level cell flash memory. In these each memory cell represents on, off and a number of intermediate values. This improves the capacity (each cell can store several bits), but it is bad news for reliability. This sort of chip is used in solid state drives, and these operate on the very edge of total unreliability in order to maximise capacity.

Why does a float variable stop incrementing at 16777216 in C#?

float a = 0;
while (true)
{
a++;
if (a > 16777216)
break; // Will never break... a stops at 16777216
}
Can anyone explain this to me why a float value stops incrementing at 16777216 in this code?
Edit:
Or even more simple:
float a = 16777217; // a becomes 16777216
Short roundup of IEEE-754 floating point numbers (32-bit) off the top of my head:
1 bit sign (0 means positive number, 1 means negative number)
8 bit exponent (with -127 bias, not important here)
23 bits "mantissa"
With exceptions for the exponent values 0 and 255, you can calculate the value as: (sign ? -1 : +1) * 2^exponent * (1.0 + mantissa)
The mantissa bits represent binary digits after the decimal separator, e.g. 1001 0000 0000 0000 0000 000 = 2^-1 + 2^-4 = .5 + .0625 = .5625 and the value in front of the decimal separator is not stored but implicitly assumed as 1 (if exponent is 255, 0 is assumed but that's not important here), so for an exponent of 30, for instance, this mantissa example represents the value 1.5625
Now to your example:
16777216 is exactly 224, and would be represented as 32-bit float like so:
sign = 0 (positive number)
exponent = 24 (stored as 24+127=151=10010111)
mantissa = .0
As 32 bits floating-point representation: 0 10010111 00000000000000000000000
Therefore: Value = (+1) * 2^24 * (1.0 + .0) = 2^24 = 16777216
Now let's look at the number 16777217, or exactly 224+1:
sign and exponent are the same
mantissa would have to be exactly 2-24 so that (+1) * 2^24 * (1.0 + 2^-24) = 2^24 + 1 = 16777217
And here's the problem. The mantissa cannot have the value 2-24 because it only has 23 bits, so the number 16777217 just cannot be represented with the accuracy of 32-bit floating points numbers!
16777217 cannot be represented exactly with a float. The next highest number that a float can represent exactly is 16777218.
So, you try to increment the float value 16777216 to 16777217, which cannot be represented in a float.
When you look at that value in its binary representation, you'll see that it's a one and many zeroes, namely 1 0000 0000 0000 0000 0000 0000, or exactly 2^24. That means, at 16777216, the number has just grown by one digit.
As it's a floating point number, this can mean that the last digit at its end that is still stored (i.e. within its precision) is shifted to the left, as well.
Probably, what you're seeing is that the last digit of precision has just shifted to something bigger than one, so adding one doesn't make any difference any more.
Imagine this in decimal form. Suppose you had the number:
1.000000 * 10^6
or 1,000,000. If all you had were six digits of accuracy, adding 0.5 to this number would yield
1.0000005 * 10^6
However, current thinking with fp rounding modes is to use "Round to Even," rather than "Round to Nearest." In this instance, every time you increment this value, it will round back down in the floating point unit back to 16,777,216, or 2^24. Singles in IEE 754 is represented as:
+/- exponent (1.) fraction
where the "1." is implied and the fraction is another 23 bits, all zeros, in this case. The extra binary 1 will spill into the guard digit, carry down to the rounding step, and be deleted each time, no matter how many times you increment it. The ulp or unit in the last place will always be zero. The last successful increment is from:
+2^23 * (+1.) 11111111111111111111111 -> +2^24 * (1.) 00000000000000000000000

Why does -2 % 360 give -2 instead of 358 in c#

Microsoft Mathematics and Google's calculator give me 358 for -2 % 360, but C# and windows calculator are outputting -2 ... which is the right answer ?
The C# compiler is doing the right thing according to the C# specification, which states that for integers:
The result of x % y is the value produced by x – (x / y) * y.
Note that (x/y) always rounds towards zero.
For the details of how remainder is computed for binary and decimal floating point numbers, see section 7.8.3 of the specification.
Whether this is the "right answer" for you depends on how you view the remainder operation. The remainder must satisfy the identity that:
dividend = quotient * divisor + remainder
I say that clearly -2 % 360 is -2. Why? Well, first ask yourself what the quotient is. How many times does 360 go into -2? Clearly zero times! 360 doesn't go into -2 at all. If the quotient is zero then the remainder must be -2 in order to satisfy the identity. It would be strange to say that 360 goes into -2 a total of -1 times, with a remainder of 358, don't you think?
Which is the right answer?
Both answers are correct. It's merely a matter of convention which value is returned.
Both, see Modulo operation on Wikipedia.
I found this very easy to understand explanation at http://mathforum.org/library/drmath/view/52343.html
There are different ways of thinking about remainders when you deal
with negative numbers, and he is probably confusing two of them. The
mod function is defined as the amount by which a number exceeds the
largest integer multiple of the divisor that is not greater than that
number. In this case, -340 lies between -360 and -300, so -360 is the
greatest multiple LESS than -340; we subtract 60 * -6 = -360 from -340
and get 20:
-420 -360 -300 -240 -180 -120 -60 0 60 120 180 240 300 360
--+----+----+----+----+----+----+----+----+----+----+----+----+----+--
| | | |
-360| |-340 300| |340
|=| |==|
20 40
Working with a positive number like 340, the multiple we subtract is
smaller in absolute value, giving us 40; but with negative numbers, we
subtract a number with a LARGER absolute value, so that the mod
function returns a positive value. This is not always what people
expect, but it is consistent.
If you want the remainder, ignoring the sign, you have to take the
absolute value before using the mod function.
Doctor Peterson, The Math Forum
http://mathforum.org/dr.math/
IMO, -2 is much easier to understand and code with. If you divide -2 by 360, your answer is 0 remainder -2 ... just as dividing 2 by 360 is 0 remainder 2. It's not as natural to consider that 358 is also the remainder of -2 mod 360.
From wikipedia:
if the remainder is nonzero, there are two possible choices for the
remainder, one negative and the other positive, and there are also two
possible choices for the quotient. Usually, in number theory, the
positive remainder is always chosen, but programming languages choose
depending on the language and the signs of a and n.[2] However, Pascal
and Algol68 do not satisfy these conditions for negative divisors, and
some programming languages, such as C89, don't even define a result if
either of n or a is negative.

Why arithmetic shift halfs a number only in SOME incidents?

Hey, I'm self-learning about bitwise, and I saw somewhere in the internet that arithmetic shift (>>) by one halfs a number. I wanted to test it:
44 >> 1 returns 22, ok
22 >> 1 returns 11, ok
11 >> 1 returns 5, and not 5.5, why?
Another Example:
255 >> 1 returns 127
127 >> 1 returns 63 and not 63.5, why?
Thanks.
The bit shift operator doesn't actually divide by 2. Instead, it moves the bits of the number to the right by the number of positions given on the right hand side. For example:
00101100 = 44
00010110 = 44 >> 1 = 22
Notice how the bits in the second line are the same as the line above, merely
shifted one place to the right. Now look at the second example:
00001011 = 11
00000101 = 11 >> 1 = 5
This is exactly the same operation as before. However, the result of 5 is due to the fact that the last bit is shifted to the right and disappears, creating the result 5. Because of this behavior, the right-shift operator will generally be equivalent to dividing by two and then throwing away any remainder or decimal portion.
11 in binary is 1011
11 >> 1
means you shift your binary representation to the right by one step.
1011 >> 1 = 101
Then you have 101 in binary which is 1*1 + 0*2 + 1*4 = 5.
If you had done 11 >> 2 you would have as a result 10 in binary i.e. 2 (1*2 + 0*1).
Shifting by 1 to the right transforms sum(A_i*2^i) [i=0..n] in sum(A_(i+1)*2^i) [i=0..n-1]
that's why if your number is even (i.e. A_0 = 0) it is divided by two. (sorry for the customised LateX syntax... :))
Binary has no concept of decimal numbers. It's returning the truncated (int) value.
11 = 1011 in binary. Shift to the right and you have 101, which is 5 in decimal.
Bit shifting is the same as division or multiplication by 2^n. In integer arithmetics the result gets rounded towards zero to an integer. In floating-point arithmetics bit shifting is not permitted.
Internally bit shifting, well, shifts bits, and the rounding simply means bits that fall off an edge simply getting removed (not that it would actually calculate the precise value and then round it). The new bits that appear on the opposite edge are always zeroes for the right hand side and for positive values. For negative values, one bits are appended on the left hand side, so that the value stays negative (see how two's complement works) and the arithmetic definition that I used still holds true.
In most statically-typed languages, the return type of the operation is e.g. "int". This precludes a fractional result, much like integer division.
(There are better answers about what's 'under the hood', but you don't need to understand those to grok the basics of the type system.)

Categories