Background:
I was "dragged" into seeing this question:
Fibonacci's Closed-form expression in Haskell
when the author initially tagged with many other languages but later focused to a Haskell question. Unfortunately I have no experience whatsoever with Haskell so I couldn't really participate in the question. However one of the answers caught my eye where the answerer turned it into a pure integer math problem. That sounded awesome to me so I had to figure out how it worked and compare this to a recursive Fibonacci implementation to see how accurate it was. I have a feeling that if I just remembered the relevant math involving irrational numbers, I might be able to work everything out myself (but I don't). So the first step for me was to port it to a language I am familiar with. In this case, I am doing C#.
I am not completely in the dark fortunately. I have plenty experience in another functional language (OCaml) so a lot of it looked somewhat familiar to me. Starting out with the conversion, everything seemed straightforward since it basically defined a new numeric type to help with the calculations. However I've hit a couple of roadblocks in the translation and am having trouble finishing it. I'm getting completely wrong results.
Analysis:
Here's the code that I'm translating:
data Ext = Ext !Integer !Integer
deriving (Eq, Show)
instance Num Ext where
fromInteger a = Ext a 0
negate (Ext a b) = Ext (-a) (-b)
(Ext a b) + (Ext c d) = Ext (a+c) (b+d)
(Ext a b) * (Ext c d) = Ext (a*c + 5*b*d) (a*d + b*c) -- easy to work out on paper
-- remaining instance methods are not needed
fib n = divide $ twoPhi^n - (2-twoPhi)^n
where twoPhi = Ext 1 1
divide (Ext 0 b) = b `div` 2^n -- effectively divides by 2^n * sqrt 5
So based on my research and what I can deduce (correct me if I'm wrong anywhere), the first part declares type Ext with a constructor that will have two Integer parameters (and I guess will inherit the Eq and Show types/modules).
Next is the implementation of Ext which "derives" from Num. fromInteger performs a conversion from an Integer. negate is the unary negation and then there's the binary addition and multiplication operators.
The last part is the actual Fibonacci implementation.
Questions:
In the answer, hammar (the answerer) mentions that exponentiation is handled by the default implementation in Num. But what does that mean and how is that actually applied to this type? Is there an implicit number "field" that I'm missing? Does it just apply the exponentiation to each corresponding number it contains? I assume it does the latter and end up with this C# code:
public static Ext operator ^(Ext x, int p) // "exponent"
{
// just apply across both parts of Ext?
return new Ext(BigInt.Pow(x.a, p), BigInt.Pow(x.b, p));
// Ext (a^p) (b^p)
}
However this conflicts with how I perceive why negate is needed, it wouldn't need it if this actually happens.
Now the meat of the code. I read the first part divide $ twoPhi^n - (2-twoPhi)^n as:
divide the result of the following expression: twoPhi^n - (2-twoPhi)^n.
Pretty simple. Raise twoPhi to the nth power. Subtract from that the result of the rest. Here we're doing binary subtraction but we only implemented unary negation. Or did we not? Or can binary subtraction be implied because it could be made up combining addition and negation (which we have)? I assume the latter. And this eases my uncertainty about the negation.
The last part is the actual division: divide (Ext 0 b) = b `div` 2^n. Two concerns here. From what I've found, there is no division operator, only a `div` function. So I would just have to divide the numbers here. Is this correct? Or is there a division operator but a separate `div` function that does something else special?
I'm not sure how to interpret the beginning of the line. Is it just a simple pattern match? In other words, would this only apply if the first parameter was a 0? What would the result be if it didn't match (the first was non-zero)? Or should I be interpreting it as we don't care about the first parameter and apply the function unconditionally? This seems to be the biggest hurdle and using either interpretation still yields the incorrect results.
Did I make any wrong assumptions anywhere? Or is it all right and I just implemented the C# incorrectly?
Code:
Here's the (non-working) translation and the full source (including tests) so far just in case anyone is interested.
// code removed to keep post size down
// full source still available through link above
Progress:
Ok so looking at the answers and comments so far, I think I know where to go from here and why.
The exponentiation just needed to do what it normally does, multiply p times given that we've implemented the multiply operation. It never crossed my mind that we should do what math class has always told us to do. The implied subtraction from having addition and negation is a pretty handy feature too.
Also spotted a typo in my implementation. I added when I should have multiplied.
// (Ext a b) * (Ext c d) = Ext (a*c + 5*b*d) (a*d + b*c)
public static Ext operator *(Ext x, Ext y)
{
return new Ext(x.a * y.a + 5*x.b*y.b, x.a*y.b + x.b*y.a);
// ^ oops!
}
Conclusion:
So now it's completed. I only implemented to essential operators and renamed it a bit. Named in a similar manner as complex numbers. So far, consistent with the recursive implementation, even at really large inputs. Here's the final code.
static readonly Complicated TWO_PHI = new Complicated(1, 1);
static BigInt Fib_x(int n)
{
var x = Complicated.Pow(TWO_PHI, n) - Complicated.Pow(2 - TWO_PHI, n);
System.Diagnostics.Debug.Assert(x.Real == 0);
return x.Bogus / BigInt.Pow(2, n);
}
struct Complicated
{
private BigInt real;
private BigInt bogus;
public Complicated(BigInt real, BigInt bogus)
{
this.real = real;
this.bogus = bogus;
}
public BigInt Real { get { return real; } }
public BigInt Bogus { get { return bogus; } }
public static Complicated Pow(Complicated value, int exponent)
{
if (exponent < 0)
throw new ArgumentException(
"only non-negative exponents supported",
"exponent");
Complicated result = 1;
Complicated factor = value;
for (int mask = exponent; mask != 0; mask >>= 1)
{
if ((mask & 0x1) != 0)
result *= factor;
factor *= factor;
}
return result;
}
public static implicit operator Complicated(int real)
{
return new Complicated(real, 0);
}
public static Complicated operator -(Complicated l, Complicated r)
{
var real = l.real - r.real;
var bogus = l.bogus - r.bogus;
return new Complicated(real, bogus);
}
public static Complicated operator *(Complicated l, Complicated r)
{
var real = l.real * r.real + 5 * l.bogus * r.bogus;
var bogus = l.real * r.bogus + l.bogus * r.real;
return new Complicated(real, bogus);
}
}
And here's the fully updated source.
[...], the first part declares type Ext with a constructor that will have two Integer parameters (and I guess will inherit the Eq and Show types/modules).
Eq and Show are type classes. You can think of them as similar to interfaces in C#, only more powerful. deriving is a construct that can be used to automatically generate implementations for a handful of standard type classes, including Eq, Show, Ord and others. This reduces the amount of boilerplate you have to write.
The instance Num Ext part provides an explicit implementation of the Num type class. You got most of this part right.
[the answerer] mentions that exponentiation is handled by the default implementation in Num. But what does that mean and how is that actually applied to this type? Is there an implicit number "field" that I'm missing? Does it just apply the exponentiation to each corresponding number it contains?
This was a bit unclear on my part. ^ is not in the type class Num, but it is an auxilliary function defined entirely in terms of the Num methods, sort of like an extension method. It implements exponentiation to positive integral powers through binary exponentiation. This is the main "trick" of the code.
[...] we're doing binary subtraction but we only implemented unary negation. Or did we not? Or can binary subtraction be implied because it could be made up combinding addition and negation (which we have)?
Correct. The default implementation of binary minus is x - y = x + (negate y).
The last part is the actual division: divide (Ext 0 b) = b `div` 2^n. Two concerns here. From what I've found, there is no division operator, only a div function. So I would just have to divide the numbers here. Is this correct? Or is there a division operator but a separate div function that does something else special?
There is only a syntactic difference between operators and functions in Haskell. One can treat an operator as a function by writing it parenthesis (+), or treat a function as a binary operator by writing it in `backticks`.
div is integer division and belongs to the type class Integral, so it is defined for all integer-like types, including Int (machine-sized integers) and Integer (arbitrary-size integers).
I'm not sure how to interpret the beginning of the line. Is it just a simple pattern match? In other words, would this only apply if the first parameter was a 0? What would the result be if it didn't match (the first was non-zero)? Or should I be interpreting it as we don't care about the first parameter and apply the function unconditionally?
It is indeed just a simple pattern match to extract the coefficient of √5. The integral part is matched against a zero to express to readers that we indeed expect it to always be zero, and to make the program crash if some bug in the code was causing it not to be.
A small improvement
Replacing Integer with Rational in the original code, you can write fib n even closer to Binet's formula:
fib n = divSq5 $ phi^n - (1-phi)^n
where divSq5 (Ext 0 b) = numerator b
phi = Ext (1/2) (1/2)
This performs the divisions throughout the computation, instead of saving it all up for the end. This results in smaller intermediate numbers and about 20% speedup when calculating fib (10^6).
First, Num, Show, Eq are type classes, not types nor modules. They are a bit similar to interfaces in C#, but are resolved statically rather than dynamically.
Second, exponentiation is performed via multiplication with the implementation of ^, which is not a member of the Num typeclass, but a separate function.
The implementation is the following:
(^) :: (Num a, Integral b) => a -> b -> a
x0 ^ y0 | y0 < 0 = error "Negative exponent"
| y0 == 0 = 1
| otherwise = f x0 y0
where -- f : x0 ^ y0 = x ^ y
f x y | even y = f (x * x) (y `quot` 2)
| y == 1 = x
| otherwise = g (x * x) ((y - 1) `quot` 2) x
-- g : x0 ^ y0 = (x ^ y) * z
g x y z | even y = g (x * x) (y `quot` 2) z
| y == 1 = x * z
| otherwise = g (x * x) ((y - 1) `quot` 2) (x * z)
This seems to be the missing part of solution.
You are right about subtraction. It is implemented via addition and negation.
Now, the divide function divides only if a equals to 0. Otherwise we get a pattern match failure, indicating a bug in the program.
The div function is a simple integer division, equivalent to / applied to integral types in C#. There is also an operator / in Haskell, but it indicates real number division.
A quick implementation in C#. I implemented exponentiation using the square-and-multiply algorithm.
It is enlightening to compare this type which has the form a+b*Sqrt(5) with the complex numbers which take the form a+b*Sqrt(-1). Addition and subtraction work just the same. Multiplication is slightly different, because i^2 isn't -1 but +5 here. Division is slightly more complicated, but shouldn't be too hard either.
Exponentiation is defined as multiplying a number with itself n times. But of course that's slow. So we use the fact that ((a*a)*a)*a is identical to (a*a)*(a*a) and rewrite using the square-and-multiply algorithm. So we just need log(n) multiplications instead of n multiplications.
Just calculating the exponential of the individual components doesn't work. That's because the matrix underlying your type isn't diagonal. Compare this to the property of complex numbers. You can't simply calculate the exponential of the real and imaginary part separately.
struct MyNumber
{
public readonly BigInteger Real;
public readonly BigInteger Sqrt5;
public MyNumber(BigInteger real,BigInteger sqrt5)
{
Real=real;
Sqrt5=sqrt5;
}
public static MyNumber operator -(MyNumber left,MyNumber right)
{
return new MyNumber(left.Real-right.Real, left.Sqrt5-right.Sqrt5);
}
public static MyNumber operator*(MyNumber left,MyNumber right)
{
BigInteger real=left.Real*right.Real + left.Sqrt5*right.Sqrt5*5;
BigInteger sqrt5=left.Real*right.Sqrt5 + right.Real*left.Sqrt5;
return new MyNumber(real,sqrt5);
}
public static MyNumber Power(MyNumber b,int exponent)
{
if(!(exponent>=0))
throw new ArgumentException();
MyNumber result=new MyNumber(1,0);
MyNumber multiplier=b;
while(exponent!=0)
{
if((exponent&1)==1)//exponent is odd
result*=multiplier;
multiplier=multiplier*multiplier;
exponent/=2;
}
return result;
}
public override string ToString()
{
return Real.ToString()+"+"+Sqrt5.ToString()+"*Sqrt(5)";
}
}
BigInteger Fibo(int n)
{
MyNumber num = MyNumber.Power(new MyNumber(1,1),n)-MyNumber.Power(new MyNumber(1,-1),n);
num.Dump();
if(num.Real!=0)
throw new Exception("Asser failed");
return num.Sqrt5/BigInteger.Pow(2,n);
}
void Main()
{
MyNumber num=new MyNumber(1,2);
MyNumber.Power(num,2).Dump();
Fibo(5).Dump();
}
Related
I'm currently writing some code where I have something along the lines of:
double a = SomeCalculation1();
double b = SomeCalculation2();
if (a < b)
DoSomething2();
else if (a > b)
DoSomething3();
And then in other places I may need to do equality:
double a = SomeCalculation3();
double b = SomeCalculation4();
if (a == 0.0)
DoSomethingUseful(1 / a);
if (b == 0.0)
return 0; // or something else here
In short, I have lots of floating point math going on and I need to do various comparisons for conditions. I can't convert it to integer math because such a thing is meaningless in this context.
I've read before that floating point comparisons can be unreliable, since you can have things like this going on:
double a = 1.0 / 3.0;
double b = a + a + a;
if ((3 * a) != b)
Console.WriteLine("Oh no!");
In short, I'd like to know: How can I reliably compare floating point numbers (less than, greater than, equality)?
The number range I am using is roughly from 10E-14 to 10E6, so I do need to work with small numbers as well as large.
I've tagged this as language agnostic because I'm interested in how I can accomplish this no matter what language I'm using.
TL;DR
Use the following function instead of the currently accepted solution to avoid some undesirable results in certain limit cases, while being potentially more efficient.
Know the expected imprecision you have on your numbers and feed them accordingly in the comparison function.
bool nearly_equal(
float a, float b,
float epsilon = 128 * FLT_EPSILON, float abs_th = FLT_MIN)
// those defaults are arbitrary and could be removed
{
assert(std::numeric_limits<float>::epsilon() <= epsilon);
assert(epsilon < 1.f);
if (a == b) return true;
auto diff = std::abs(a-b);
auto norm = std::min((std::abs(a) + std::abs(b)), std::numeric_limits<float>::max());
// or even faster: std::min(std::abs(a + b), std::numeric_limits<float>::max());
// keeping this commented out until I update figures below
return diff < std::max(abs_th, epsilon * norm);
}
Graphics, please?
When comparing floating point numbers, there are two "modes".
The first one is the relative mode, where the difference between x and y is considered relatively to their amplitude |x| + |y|. When plot in 2D, it gives the following profile, where green means equality of x and y. (I took an epsilon of 0.5 for illustration purposes).
The relative mode is what is used for "normal" or "large enough" floating points values. (More on that later).
The second one is an absolute mode, when we simply compare their difference to a fixed number. It gives the following profile (again with an epsilon of 0.5 and a abs_th of 1 for illustration).
This absolute mode of comparison is what is used for "tiny" floating point values.
Now the question is, how do we stitch together those two response patterns.
In Michael Borgwardt's answer, the switch is based on the value of diff, which should be below abs_th (Float.MIN_NORMAL in his answer). This switch zone is shown as hatched in the graph below.
Because abs_th * epsilon is smaller that abs_th, the green patches do not stick together, which in turn gives the solution a bad property: we can find triplets of numbers such that x < y_1 < y_2 and yet x == y2 but x != y1.
Take this striking example:
x = 4.9303807e-32
y1 = 4.930381e-32
y2 = 4.9309825e-32
We have x < y1 < y2, and in fact y2 - x is more than 2000 times larger than y1 - x. And yet with the current solution,
nearlyEqual(x, y1, 1e-4) == False
nearlyEqual(x, y2, 1e-4) == True
By contrast, in the solution proposed above, the switch zone is based on the value of |x| + |y|, which is represented by the hatched square below. It ensures that both zones connects gracefully.
Also, the code above does not have branching, which could be more efficient. Consider that operations such as max and abs, which a priori needs branching, often have dedicated assembly instructions. For this reason, I think this approach is superior to another solution that would be to fix Michael's nearlyEqual by changing the switch from diff < abs_th to diff < eps * abs_th, which would then produce essentially the same response pattern.
Where to switch between relative and absolute comparison?
The switch between those modes is made around abs_th, which is taken as FLT_MIN in the accepted answer. This choice means that the representation of float32 is what limits the precision of our floating point numbers.
This does not always make sense. For example, if the numbers you compare are the results of a subtraction, perhaps something in the range of FLT_EPSILON makes more sense. If they are squared roots of subtracted numbers, the numerical imprecision could be even higher.
It is rather obvious when you consider comparing a floating point with 0. Here, any relative comparison will fail, because |x - 0| / (|x| + 0) = 1. So the comparison needs to switch to absolute mode when x is on the order of the imprecision of your computation -- and rarely is it as low as FLT_MIN.
This is the reason for the introduction of the abs_th parameter above.
Also, by not multiplying abs_th with epsilon, the interpretation of this parameter is simple and correspond to the level of numerical precision that we expect on those numbers.
Mathematical rumbling
(kept here mostly for my own pleasure)
More generally I assume that a well-behaved floating point comparison operator =~ should have some basic properties.
The following are rather obvious:
self-equality: a =~ a
symmetry: a =~ b implies b =~ a
invariance by opposition: a =~ b implies -a =~ -b
(We don't have a =~ b and b =~ c implies a =~ c, =~ is not an equivalence relationship).
I would add the following properties that are more specific to floating point comparisons
if a < b < c, then a =~ c implies a =~ b (closer values should also be equal)
if a, b, m >= 0 then a =~ b implies a + m =~ b + m (larger values with the same difference should also be equal)
if 0 <= λ < 1 then a =~ b implies λa =~ λb (perhaps less obvious to argument for).
Those properties already give strong constrains on possible near-equality functions. The function proposed above verifies them. Perhaps one or several otherwise obvious properties are missing.
When one think of =~ as a family of equality relationship =~[Ɛ,t] parameterized by Ɛ and abs_th, one could also add
if Ɛ1 < Ɛ2 then a =~[Ɛ1,t] b implies a =~[Ɛ2,t] b (equality for a given tolerance implies equality at a higher tolerance)
if t1 < t2 then a =~[Ɛ,t1] b implies a =~[Ɛ,t2] b (equality for a given imprecision implies equality at a higher imprecision)
The proposed solution also verifies these.
Comparing for greater/smaller is not really a problem unless you're working right at the edge of the float/double precision limit.
For a "fuzzy equals" comparison, this (Java code, should be easy to adapt) is what I came up with for The Floating-Point Guide after a lot of work and taking into account lots of criticism:
public static boolean nearlyEqual(float a, float b, float epsilon) {
final float absA = Math.abs(a);
final float absB = Math.abs(b);
final float diff = Math.abs(a - b);
if (a == b) { // shortcut, handles infinities
return true;
} else if (a == 0 || b == 0 || diff < Float.MIN_NORMAL) {
// a or b is zero or both are extremely close to it
// relative error is less meaningful here
return diff < (epsilon * Float.MIN_NORMAL);
} else { // use relative error
return diff / (absA + absB) < epsilon;
}
}
It comes with a test suite. You should immediately dismiss any solution that doesn't, because it is virtually guaranteed to fail in some edge cases like having one value 0, two very small values opposite of zero, or infinities.
An alternative (see link above for more details) is to convert the floats' bit patterns to integer and accept everything within a fixed integer distance.
In any case, there probably isn't any solution that is perfect for all applications. Ideally, you'd develop/adapt your own with a test suite covering your actual use cases.
I had the problem of Comparing floating point numbers A < B and A > B
Here is what seems to work:
if(A - B < Epsilon) && (fabs(A-B) > Epsilon)
{
printf("A is less than B");
}
if (A - B > Epsilon) && (fabs(A-B) > Epsilon)
{
printf("A is greater than B");
}
The fabs--absolute value-- takes care of if they are essentially equal.
We have to choose a tolerance level to compare float numbers. For example,
final float TOLERANCE = 0.00001;
if (Math.abs(f1 - f2) < TOLERANCE)
Console.WriteLine("Oh yes!");
One note. Your example is rather funny.
double a = 1.0 / 3.0;
double b = a + a + a;
if (a != b)
Console.WriteLine("Oh no!");
Some maths here
a = 1/3
b = 1/3 + 1/3 + 1/3 = 1.
1/3 != 1
Oh, yes..
Do you mean
if (b != 1)
Console.WriteLine("Oh no!")
Idea I had for floating point comparison in swift
infix operator ~= {}
func ~= (a: Float, b: Float) -> Bool {
return fabsf(a - b) < Float(FLT_EPSILON)
}
func ~= (a: CGFloat, b: CGFloat) -> Bool {
return fabs(a - b) < CGFloat(FLT_EPSILON)
}
func ~= (a: Double, b: Double) -> Bool {
return fabs(a - b) < Double(FLT_EPSILON)
}
Adaptation to PHP from Michael Borgwardt & bosonix's answer:
class Comparison
{
const MIN_NORMAL = 1.17549435E-38; //from Java Specs
// from http://floating-point-gui.de/errors/comparison/
public function nearlyEqual($a, $b, $epsilon = 0.000001)
{
$absA = abs($a);
$absB = abs($b);
$diff = abs($a - $b);
if ($a == $b) {
return true;
} else {
if ($a == 0 || $b == 0 || $diff < self::MIN_NORMAL) {
return $diff < ($epsilon * self::MIN_NORMAL);
} else {
return $diff / ($absA + $absB) < $epsilon;
}
}
}
}
You should ask yourself why you are comparing the numbers. If you know the purpose of the comparison then you should also know the required accuracy of your numbers. That is different in each situation and each application context. But in pretty much all practical cases there is a required absolute accuracy. It is only very seldom that a relative accuracy is applicable.
To give an example: if your goal is to draw a graph on the screen, then you likely want floating point values to compare equal if they map to the same pixel on the screen. If the size of your screen is 1000 pixels, and your numbers are in the 1e6 range, then you likely will want 100 to compare equal to 200.
Given the required absolute accuracy, then the algorithm becomes:
public static ComparisonResult compare(float a, float b, float accuracy)
{
if (isnan(a) || isnan(b)) // if NaN needs to be supported
return UNORDERED;
if (a == b) // short-cut and takes care of infinities
return EQUAL;
if (abs(a-b) < accuracy) // comparison wrt. the accuracy
return EQUAL;
if (a < b) // larger / smaller
return SMALLER;
else
return LARGER;
}
The standard advice is to use some small "epsilon" value (chosen depending on your application, probably), and consider floats that are within epsilon of each other to be equal. e.g. something like
#define EPSILON 0.00000001
if ((a - b) < EPSILON && (b - a) < EPSILON) {
printf("a and b are about equal\n");
}
A more complete answer is complicated, because floating point error is extremely subtle and confusing to reason about. If you really care about equality in any precise sense, you're probably seeking a solution that doesn't involve floating point.
I tried writing an equality function with the above comments in mind. Here's what I came up with:
Edit: Change from Math.Max(a, b) to Math.Max(Math.Abs(a), Math.Abs(b))
static bool fpEqual(double a, double b)
{
double diff = Math.Abs(a - b);
double epsilon = Math.Max(Math.Abs(a), Math.Abs(b)) * Double.Epsilon;
return (diff < epsilon);
}
Thoughts? I still need to work out a greater than, and a less than as well.
I came up with a simple approach to adjusting the size of epsilon to the size of the numbers being compared. So, instead of using:
iif(abs(a - b) < 1e-6, "equal", "not")
if a and b can be large, I changed that to:
iif(abs(a - b) < (10 ^ -abs(7 - log(a))), "equal", "not")
I suppose that doesn't satisfy all the theoretical issues discussed in the other answers, but it has the advantage of being one line of code, so it can be used in an Excel formula or an Access query without needing a VBA function.
I did a search to see if others have used this method and I didn't find anything. I tested it in my application and it seems to be working well. So it seems to be a method that is adequate for contexts that don't require the complexity of the other answers. But I wonder if it has a problem I haven't thought of since no one else seems to be using it.
If there's a reason the test with the log is not valid for simple comparisons of numbers of various sizes, please say why in a comment.
You need to take into account that the truncation error is a relative one. Two numbers are about equal if their difference is about as large as their ulp (Unit in the last place).
However, if you do floating point calculations, your error potential goes up with every operation (esp. careful with subtractions!), so your error tolerance needs to increase accordingly.
The best way to compare doubles for equality/inequality is by taking the absolute value of their difference and comparing it to a small enough (depending on your context) value.
double eps = 0.000000001; //for instance
double a = someCalc1();
double b = someCalc2();
double diff = Math.abs(a - b);
if (diff < eps) {
//equal
}
I want to ensure that a division of integers is always rounded up if necessary. Is there a better way than this? There is a lot of casting going on. :-)
(int)Math.Ceiling((double)myInt1 / myInt2)
UPDATE: This question was the subject of my blog in January 2013. Thanks for the great question!
Getting integer arithmetic correct is hard. As has been demonstrated amply thus far, the moment you try to do a "clever" trick, odds are good that you've made a mistake. And when a flaw is found, changing the code to fix the flaw without considering whether the fix breaks something else is not a good problem-solving technique. So far we've had I think five different incorrect integer arithmetic solutions to this completely not-particularly-difficult problem posted.
The right way to approach integer arithmetic problems -- that is, the way that increases the likelihood of getting the answer right the first time - is to approach the problem carefully, solve it one step at a time, and use good engineering principles in doing so.
Start by reading the specification for what you're trying to replace. The specification for integer division clearly states:
The division rounds the result towards zero
The result is zero or positive when the two operands have the same sign and zero or negative when the two operands have opposite signs
If the left operand is the smallest representable int and the right operand is –1, an overflow occurs. [...] it is implementation-defined as to whether [an ArithmeticException] is thrown or the overflow goes unreported with the resulting value being that of the left operand.
If the value of the right operand is zero, a System.DivideByZeroException is thrown.
What we want is an integer division function which computes the quotient but rounds the result always upwards, not always towards zero.
So write a specification for that function. Our function int DivRoundUp(int dividend, int divisor) must have behaviour defined for every possible input. That undefined behaviour is deeply worrying, so let's eliminate it. We'll say that our operation has this specification:
operation throws if divisor is zero
operation throws if dividend is int.minval and divisor is -1
if there is no remainder -- division is 'even' -- then the return value is the integral quotient
Otherwise it returns the smallest integer that is greater than the quotient, that is, it always rounds up.
Now we have a specification, so we know we can come up with a testable design. Suppose we add an additional design criterion that the problem be solved solely with integer arithmetic, rather than computing the quotient as a double, since the "double" solution has been explicitly rejected in the problem statement.
So what must we compute? Clearly, to meet our spec while remaining solely in integer arithmetic, we need to know three facts. First, what was the integer quotient? Second, was the division free of remainder? And third, if not, was the integer quotient computed by rounding up or down?
Now that we have a specification and a design, we can start writing code.
public static int DivRoundUp(int dividend, int divisor)
{
if (divisor == 0 ) throw ...
if (divisor == -1 && dividend == Int32.MinValue) throw ...
int roundedTowardsZeroQuotient = dividend / divisor;
bool dividedEvenly = (dividend % divisor) == 0;
if (dividedEvenly)
return roundedTowardsZeroQuotient;
// At this point we know that divisor was not zero
// (because we would have thrown) and we know that
// dividend was not zero (because there would have been no remainder)
// Therefore both are non-zero. Either they are of the same sign,
// or opposite signs. If they're of opposite sign then we rounded
// UP towards zero so we're done. If they're of the same sign then
// we rounded DOWN towards zero, so we need to add one.
bool wasRoundedDown = ((divisor > 0) == (dividend > 0));
if (wasRoundedDown)
return roundedTowardsZeroQuotient + 1;
else
return roundedTowardsZeroQuotient;
}
Is this clever? No. Beautiful? No. Short? No. Correct according to the specification? I believe so, but I have not fully tested it. It looks pretty good though.
We're professionals here; use good engineering practices. Research your tools, specify the desired behaviour, consider error cases first, and write the code to emphasize its obvious correctness. And when you find a bug, consider whether your algorithm is deeply flawed to begin with before you just randomly start swapping the directions of comparisons around and break stuff that already works.
All the answers here so far seem rather over-complicated.
In C# and Java, for positive dividend and divisor, you simply need to do:
( dividend + divisor - 1 ) / divisor
Source: Number Conversion, Roland Backhouse, 2001
The final int-based answer
For signed integers:
int div = a / b;
if (((a ^ b) >= 0) && (a % b != 0))
div++;
For unsigned integers:
int div = a / b;
if (a % b != 0)
div++;
The reasoning for this answer
Integer division '/' is defined to round towards zero (7.7.2 of the spec), but we want to round up. This means that negative answers are already rounded correctly, but positive answers need to be adjusted.
Non-zero positive answers are easy to detect, but answer zero is a little trickier, since that can be either the rounding up of a negative value or the rounding down of a positive one.
The safest bet is to detect when the answer should be positive by checking that the signs of both integers are identical. Integer xor operator '^' on the two values will result in a 0 sign-bit when this is the case, meaning a non-negative result, so the check (a ^ b) >= 0 determines that the result should have been positive before rounding. Also note that for unsigned integers, every answer is obviously positive, so this check can be omitted.
The only check remaining is then whether any rounding has occurred, for which a % b != 0 will do the job.
Lessons learned
Arithmetic (integer or otherwise) isn't nearly as simple as it seems. Thinking carefully required at all times.
Also, although my final answer is perhaps not as 'simple' or 'obvious' or perhaps even 'fast' as the floating point answers, it has one very strong redeeming quality for me; I have now reasoned through the answer, so I am actually certain it is correct (until someone smarter tells me otherwise -furtive glance in Eric's direction-).
To get the same feeling of certainty about the floating point answer, I'd have to do more (and possibly more complicated) thinking about whether there is any conditions under which the floating-point precision might get in the way, and whether Math.Ceiling perhaps does something undesirable on 'just the right' inputs.
The path travelled
Replace (note I replaced the second myInt1 with myInt2, assuming that was what you meant):
(int)Math.Ceiling((double)myInt1 / myInt2)
with:
(myInt1 - 1 + myInt2) / myInt2
The only caveat being that if myInt1 - 1 + myInt2 overflows the integer type you are using, you might not get what you expect.
Reason this is wrong: -1000000 and 3999 should give -250, this gives -249
EDIT:
Considering this has the same error as the other integer solution for negative myInt1 values, it might be easier to do something like:
int rem;
int div = Math.DivRem(myInt1, myInt2, out rem);
if (rem > 0)
div++;
That should give the correct result in div using only integer operations.
Reason this is wrong: -1 and -5 should give 1, this gives 0
EDIT (once more, with feeling):
The division operator rounds towards zero; for negative results this is exactly right, so only non-negative results need adjustment. Also considering that DivRem just does a / and a % anyway, let's skip the call (and start with the easy comparison to avoid modulo calculation when it is not needed):
int div = myInt1 / myInt2;
if ((div >= 0) && (myInt1 % myInt2 != 0))
div++;
Reason this is wrong: -1 and 5 should give 0, this gives 1
(In my own defence of the last attempt I should never have attempted a reasoned answer while my mind was telling me I was 2 hours late for sleep)
Perfect chance to use an extension method:
public static class Int32Methods
{
public static int DivideByAndRoundUp(this int number, int divideBy)
{
return (int)Math.Ceiling((float)number / (float)divideBy);
}
}
This makes your code uber readable too:
int result = myInt.DivideByAndRoundUp(4);
You could write a helper.
static int DivideRoundUp(int p1, int p2) {
return (int)Math.Ceiling((double)p1 / p2);
}
You could use something like the following.
a / b + ((Math.Sign(a) * Math.Sign(b) > 0) && (a % b != 0)) ? 1 : 0)
For signed or unsigned integers.
q = x / y + !(((x < 0) != (y < 0)) || !(x % y));
For signed dividends and unsigned divisors.
q = x / y + !((x < 0) || !(x % y));
For unsigned dividends and signed divisors.
q = x / y + !((y < 0) || !(x % y));
For unsigned integers.
q = x / y + !!(x % y);
Zero divisor fails (as with a native operation).
Cannot overflow.
Elegant and correct.
The key to understanding the behavior is to recognize the difference in truncated, floored and ceilinged division. C#/C++ is natively truncated. When the quotient is negative (i.e. the operators signs are different) then truncation is a ceiling (less negative). Otherwise truncation is a floor (less positive).
So, if there is a remainder, add 1 if the result is positive. Modulo is the same, but you instead add the divisor. Flooring is the same, but you subtract under the reversed conditions.
By round up, I take it you mean away form zero always. Without any castings, use the Math.DivRem() function
/// <summary>
/// Divide a/b but always round up
/// </summary>
/// <param name="a">The numerator.</param>
/// <param name="b">The denominator.</param>
int DivRndUp(int a, int b)
{
// remove sign
int s = Math.Sign(a) * Math.Sign(b);
a = Math.Abs(a);
b = Math.Abs(b);
var c = Math.DivRem(a, b, out int r);
// if remainder >0 round up
if (r > 0)
{
c++;
}
return s * c;
}
If roundup means always up regardless of sign, then
/// <summary>
/// Divide a/b but always round up
/// </summary>
/// <param name="a">The numerator.</param>
/// <param name="b">The denominator.</param>
int DivRndUp(int a, int b)
{
// remove sign
int s = Math.Sign(a) * Math.Sign(b);
a = Math.Abs(a);
b = Math.Abs(b);
var c = Math.DivRem(a, b, out int r);
// if remainder >0 round up
if (r > 0)
{
c+=s;
}
return s * c;
}
Some of the above answers use floats, this is inefficient and really not necessary. For unsigned ints this is an efficient answer for int1/int2:
(int1 == 0) ? 0 : (int1 - 1) / int2 + 1;
For signed ints this will not be correct
The problem with all the solutions here is either that they need a cast or they have a numerical problem. Casting to float or double is always an option, but we can do better.
When you use the code of the answer from #jerryjvl
int div = myInt1 / myInt2;
if ((div >= 0) && (myInt1 % myInt2 != 0))
div++;
there is a rounding error. 1 / 5 would round up, because 1 % 5 != 0. But this is wrong, because rounding will only occur if you replace the 1 with a 3, so the result is 0.6. We need to find a way to round up when the calculation give us a value greater than or equal to 0.5. The result of the modulo operator in the upper example has a range from 0 to myInt2-1. The rounding will only occur if the remainder is greater than 50% of the divisor. So the adjusted code looks like this:
int div = myInt1 / myInt2;
if (myInt1 % myInt2 >= myInt2 / 2)
div++;
Of course we have a rounding problem at myInt2 / 2 too, but this result will give you a better rounding solution than the other ones on this site.
This one is for the binary and primitive experts. I am implementing a float R3 vector struct and my definition for "equality" is actually "mostly equal." Specifically, for all coords of the compared vectors Abs( (a[i] - b[i]) / (a[i] + b[i]) ) < .00001 returns true.
private static bool FloatEquality(float a, float b)
{
if (a == b)
{
return true;
}
else
{
float e;
try
{
e = (b - a) / (b + a);
}
catch (DivideByZeroException)
{
float g = float.Epsilon;
e = (b - a) / g;
}
//AppConsole.AppConsole.Instance.WriteLine(e);
if (e < .00001f && e > -.00001f)
{
return true;
}
else
{
return false;
}
}
}
My problem is in determining if there's a way to get the hash values to come out the same on vectors that meet this requirement due to the fact that I want to be able to use these vectors as "keys" for a Dictionary.
As you can see, the above code is used to check for equality on 3 different coordinates.
I was thinking of extracting the bytes from the three float coordinates and using the middle two from each.
(the following isn't code but Stack Overflow won't let me post it unless I indent it)
Vector(x,y,z):
x's float byte[] = [ x1 x2 x3 x3 ]
y's float byte[] = [ y1 y2 y3 y4 ]
z's float byte[] = [ z1 z2 z3 z4 ]
Hash code: byte[] {x2^x3 , y2^y3, z2 ^ z3, x2 ^ z3}
Or something like that... In short - I'm curious how to ensure that the hashcodes of vectors which fit my equals method will always come out the same... If someone has a great idea with very low cost computation, I'd love to hear it. Or if you could direct me to a place that discusses more in depth how floats are stored and which bytes will always be the same if the above comparison method returns equal.
I may need a new comparison method rather than a hash function because there's really no way that I can be sure that any of the bytes will match I guess...
Well, the basic idea is simple - you have to artificially reduce the precision of your floats. How to do this efficiently depends a lot on the kind of data you're expecting to see.
For example, if you're mostly using small values, you could simply use something like this:
(int)Math.Round(x1 * 1000)
^ (int)Math.Round(x2 * 1000)
^ (int)Math.Round(x3 * 1000)
Note that while I'm not actually fulfilling your if (e < .00001f && e > -.00001f) condition, it doesn't matter - the idea is to reduce the collisions, and ensure that what values that are equal will have equal hash codes. It's not necessary (or possible) to also ensure that values that are not equal will not have equal hash code. The rest should be handled in the overrides of Equals, == etc. - that's where strict equality checks must be present. Unlike Equals and company, GetHashCode() only has data about a single vector, so you don't even have an option of using data from more than that single vector in there.
Hash codes are only there to make key collisions infrequent. So Dictionary will still work if each of your vectors will return 0 in GetHashCode() - it's just that the performance will suffer. As long as equal vectors end up with equal hash codes, the hash code can be anything that suits your needs :)
Of course, the best way would simply be not to use vectors as keys in the dictionary. Find the part of the vector that interests you (helps you the most), and use that as a key. Maybe you'll find out Dictionary isn't actually what you want anyway (for example, in a game, there's tons of different space partitioning methods that can be used with vectors - from simple grid-like layouts, through manual space partitioning, up to things like BSP).
In one of my projects, I use the following smoothstep() function :
float smoothstep(float a, float b, float m, int n)
{
for(int i = 0 ; i < n ; i++)
{
m = m * m * (3 - 2 * m);
}
return a + (b - a) * m;
}
It works great, however, it has two disadvantages :
It's slow (especially for big values of n)
It doesn't work for non integer values (eg : n = 1.5)
Is there an alternative (excluding precalculating points and then interpolating) providing better performance (and same behavior), or another function giving a great approximation ?
You should be able to precompute the "m" term, since it doesn't rely on a or b, and assuming you're doing this over an entire interpolation, this should speed up your code significantly.
Alternatively, you could use the built in MathHelper.Smoothstep method, which provides a cubic interpolation, rather than the linear interpolation you get out of your version. There are also other, more advanced, interpolators in that class, also.
This code works (C# 3)
double d;
if(d == (double)(int)d) ...;
Is there a better way to do this?
For extraneous reasons I want to avoid the double cast so; what nice ways exist other than this? (even if they aren't as good)
Note: Several people pointed out the (important) point that == is often problematic regrading floating point. In this cases I expect values in the range of 0 to a few hundred and they are supposed to be integers (non ints are errors) so if those points "shouldn't" be an issue for me.
d == Math.Floor(d)
does the same thing in other words.
NB: Hopefully you're aware that you have to be very careful when doing this kind of thing; floats/doubles will very easily accumulate miniscule errors that make exact comparisons (like this one) fail for no obvious reason.
This would work I think:
if (d % 1 == 0) {
//...
}
If your double is the result of another calculation, you probably want something like:
d == Math.Floor(d + 0.00001);
That way, if there's been a slight rounding error, it'll still match.
I cannot answer the C#-specific part of the question, but I must point out you are probably missing a generic problem with floating point numbers.
Generally, integerness is not well defined on floats. For the same reason that equality is not well defined on floats. Floating point calculations normally include both rounding and representation errors.
For example, 1.1 + 0.6 != 1.7.
Yup, that's just the way floating point numbers work.
Here, 1.1 + 0.6 - 1.7 == 2.2204460492503131e-16.
Strictly speaking, the closest thing to equality comparison you can do with floats is comparing them up to a chosen precision.
If this is not sufficient, you must work with a decimal number representation, with a floating point number representation with built-in error range, or with symbolic computations.
A simple test such as 'x == floor(x)' is mathematically assured to work correctly, for any fixed-precision FP number.
All legal fixed-precision FP encodings represent distinct real numbers, and so for every integer x, there is at most one fixed-precision FP encoding that matches it exactly.
Therefore, for every integer x that CAN be represented in such way, we have x == floor(x) necessarily, since floor(x) by definition returns the largest FP number y such that y <= x and y represents an integer; so floor(x) must return x.
If you are just going to convert it, Mike F / Khoth's answer is good, but doesn't quite answer your question. If you are going to actually test, and it's actually important, I recommend you implement something that includes a margin of error.
For instance, if you are considering money and you want to test for even dollar amounts, you might say (following Khoth's pattern):
if( Math.abs(d - Math.Floor(d + 0.001)) < 0.001)
In other words, take the absolute value of the difference of the value and it's integer representation and ensure that it's small.
You don't need the extra (double) in there. This works:
if (d == (int)d) {
//...
}
Use Math.Truncate()
This will let you choose what precision you're looking for, plus or minus half a tick, to account for floating point drift. The comparison is integral also which is nice.
static void Main(string[] args)
{
const int precision = 10000;
foreach (var d in new[] { 2, 2.9, 2.001, 1.999, 1.99999999, 2.00000001 })
{
if ((int) (d*precision + .5)%precision == 0)
{
Console.WriteLine("{0} is an int", d);
}
}
}
and the output is
2 is an int
1.99999999 is an int
2.00000001 is an int
Something like this
double d = 4.0;
int i = 4;
bool equal = d.CompareTo(i) == 0; // true
Could you use this
bool IsInt(double x)
{
try
{
int y = Int16.Parse(x.ToString());
return true;
}
catch
{
return false;
}
}
To handle the precision of the double...
Math.Abs(d - Math.Floor(d)) <= double.Epsilon
Consider the following case where a value less then double.Epsilon fails to compare as zero.
// number of possible rounds
const int rounds = 1;
// precision causes rounding up to double.Epsilon
double d = double.Epsilon*.75;
// due to the rounding this comparison fails
Console.WriteLine(d == Math.Floor(d));
// this comparison succeeds by accounting for the rounding
Console.WriteLine(Math.Abs(d - Math.Floor(d)) <= rounds*double.Epsilon);
// The difference is double.Epsilon, 4.940656458412465E-324
Console.WriteLine(Math.Abs(d - Math.Floor(d)).ToString("E15"));