How to make my extended range floating point multiply more efficient?

How to make my extended range floating point multiply more efficient? - c#

I am doing a calculation which frequently involves values like 3.47493E+17298. This is way beyond what a double can handle, and I don't need extra precision, just extra range of exponents, so I created my own little struct in C#.
My struct uses a long for significand and sign, and an int for exponent, so I effectively have:
1 sign bit
32 exponent bits (regular 2's complement exponent)
63 significand bits
I am curious what steps could be made to make my multiplication routine more efficient. I am running an enormous number of multiplications of these extended range values, and it is pretty fast, but I was looking for hints as to making it faster.
My multiplication routine:
public static BigFloat Multiply(BigFloat left, BigFloat right)
{
long shsign1;
long shsign2;
if (left.significand == 0)
{
return bigZero;
}
if (right.significand == 0)
{
return bigZero;
}
shsign1 = left.significand;
shsign2 = right.significand;
// scaling down significand to prevent overflow multiply
// s1 and s2 indicate how much the left and right
// significands need shifting.
// The multLimit is a long constant indicating the
// max value I want either significand to be
int s1 = qshift(shsign1, multLimit);
int s2 = qshift(shsign2, multLimit);
shsign1 >>= s1;
shsign2 >>= s2;
BigFloat r;
r.significand = shsign1 * shsign2;
r.exponent = left.exponent + right.exponent + s1 + s2;
return r;
}
And the qshift:
It just finds out how much to shift the val to make it smaller in absolute value than the limit.
public static int qshift(long val, long limit)
{
long q = val;
long c = limit;
long nc = -limit;
int counter = 0;
while (q > c || q < nc)
{
q >>= 1;
counter++;
}
return counter;
}

Here is a completely different idea...
Use the hardware's floating-point machinery, but augment it with your own integer exponents. Put another way, make BigFloat.significand be a floating-point number instead of an integer.
Then you can use ldexp and frexp to keep the actual exponent on the float equal to zero. These should be single machine instructions.
So BigFloat multiply becomes:
r.significand = left.significand * right.significand
r.exponent = left.exponent + right.exponent
tmp = (actual exponent of r.significand from frexp)
r.exponent += tmp
(use ldexp to subtract tmp from actual exponent of r.significand)
Unfortunately,the last two steps require frexp and ldexp, which searches suggest are not available in C#. So you might have to write this bit in C.
...
Or, actually...
Use floating-point numbers for the significands, but just keep them normalized between 1 and 2. So again, use floats for the significands, and multiply like this:
r.significand = left.significand * right.significand;
r.exponent = left.exponent + right.exponent;
if (r.significand >= 2) {
r.significand /= 2;
r.exponent += 1;
}
assert (r.significand >= 1 && r.significand < 2); // for debugging...
This should work as long as you maintain the invariant mentioned in the assert(). (Because if x is between 1 and 2 and y is between 1 and 2 then x*y is between 1 and 4, so the normalization step is just has to check for when the significand product is between 2 and 4.)
You will also need to normalize the results of additions etc., but I suspect you are already doing that.
Although you will need to special-case zero after all :-).
[edit, to flesh out the frexp version]
BigFloat BigFloat::normalize(BigFloat b)
{
double temp = b.significand;
double tempexp = b.exponent;
double temp2, tempexp2;
temp2 = frexp(temp, &tempexp2);
// Need to test temp2 for infinity and NaN here
tempexp += tempexp2;
if (tempexp < MIN_EXP)
// underflow!
if (tempexp > MAX_EXP)
// overflow!
BigFloat r;
r.exponent = tempexp;
r.significand = temp2;
}
In other words, I would suggest factoring this out as a "normalize" routine, since presumably you want to use it following additions, subtractions, multiplications, and divisions.
And then there are all the corner cases to worry about...
You probably want to handle underflow by returning zero. Overflow depends on your tastes; should either be an error or +-infinity. Finally, if the result of frexp() is infinity or NaN, the value of tempexp2 is undefined, so you might want to check those cases, too.

I am not much of a C# programmer, but here are some general ideas.
First, are there any profiling tools for C#? If so, start with those...
The time is very likely being spent in your qshift() function; in particular, the loop. Mispredicted branches are nasty.
I would rewrite it as:
long q = abs(val);
int x = q/nc;
(find next power of 2 bigger than x)
For that last step, see this question and answer.
Then instead of shifting by qshift, just divide by this power of 2. (Does C# have "find first set" (aka. ffs)? If so, you can use it to get the shift count from the power of 2; it should be one instruction.)
Definitely inline this sequence if the compiler will not do it for you.
Also, I would ditch the special cases for zero, unless you are multiplying by zero a lot. Linear code good; conditionals bad.

If you're sure there won't be an overflow, you can use an unchecked block.
That will remove the overflow checks, and give you a bit more performance.

Related

Algorithm to round two numbers to the nearest evenly divisible ones

The title is not really well phrased, I'm aware - can't think of a better way of writing it though.
Here's the scenario - I have two input boxes, both representing integer quantities. One is represented in our units, the other in the vendor's units. There is a multiplier defining how to convert from ours to theirs. In the below example, I'm saying that two of our units is equal to five of theirs. So, for example,
decimal multiplier = 0.4; // Two of our units equals five of theirs
int requestedQuantity = 11; // Our units
int suppliedQuantity = 37; // Their units
// Should return 12, since that is the next highest whole number that results in both of us having whole numbers (12 of ours = 30 of theirs)
int correctedFromRequestedQuantity = GetCorrectedRequestedQuantity(requestedQuantity, null, multiplier);
// Should return 16, since that is the next highest whole number that results in both of us having whole numbers (16 of ours = 40 of theirs);
int correctedFromSuppliedQuantity = GetCorrectedRequestedQuantity(suppliedQuantity, multiplier, null);
Here's the function I've written to handle this. I'm not doing a divide by zero check on the multiplier / rounder since I've already checked for that elsewhere. It seems crazy to do all that converting, but is there a better way of doing it?
public int GetCorrectedRequestedQuantity(int? input, decimal? multiplier, decimal? rounder)
{
if (multiplier == null)
{
if (rounder == null)
return input.GetValueOrDefault();
else
return (int)Math.Ceiling((decimal)((decimal)Math.Ceiling(input.GetValueOrDefault() / rounder.Value) * rounder.Value));
}
else if (input.HasValue)
{
// This is insane...
return (int)Math.Ceiling((decimal)((decimal)Math.Ceiling((int)Math.Ceiling((decimal)input * multiplier.Value) / multiplier.Value) * multiplier.Value));
}
else
return 0;
}

Represent the multiplier as a fraction in lowest terms. I don't know if .NET has a fractions class but if not you can probably find a C# implementation, or just write your own. So assume the multiplier is given by two integers a / b in lowest terms, with a ≠ 0 and b ≠ 0. That also means that conversion in the other direction is given by multiplying by b / a. In your example, a = 2 and b = 5, and a / b = 0.4.
Now suppose you want to convert an integer X. If you think about it a bit you'll see what you really want is to nudge X up until b divides X. The number you need to add to X is simply (b - (X%b)) % b. So to convert on one direction is just
return (a * (X + (b - (X % b) % b))) / b;
and to convert Y going in the other direction is just
return (b * (Y + (a - (X % a) % a))) / a;

My best idea of my head is to semi brute-force it. It does sound like it is basically Fraction Mathematics. So there might be a way easier solution for this.
First we need to find in what sort of "Batch" the multiplier becomes whole. That way, we can stop working with floats/doubles altogether. Ideally this should be supplied with the multiplier (as float math is messy).
double currentMultiple=multiplier;
int currentCount=0;
//This is the best check for "is an integer" could think off.
while(currentMultiple % 1 = 0){
//The Framework can detect Arithmetic Overflow. Let us turn that one on
//If we ever get there, likely the math is non-solveable
checked{
currentMultiple+= multiplier;
currentCount += 1;
}
}
//You get here either via exception or because you got a multiple that solves it.
//Store the value of currentCount into a variable "OurBatchSize"
//Also store the value of currentMultiple in "TheirBatchSize"
Getting the closest Multiple of OurBatchSize:
int requestedQuantity = 11; // Our units
int result = OurBatchSize;
int batchCount = 0;
while(temp < requestedQuantity){
result += OurBatchSize;
batchCount++
}
//result contains the answer here. Return it
//batchCount * TheirBatchSize will also tell you how much they get.
Edit: Credit for this goes mostly to James Reinstate Monica Polk. He had the math idea to use Modulo for this. Here is what I got with explanation:
int result;
int rest = requestedAmout % BatchSize;
if (rest != 0){
//Correct upwards to the next multiple
int DistanceToNextMultiple = BatchSize - Rest;
result = requestedAmout + DistanceToMultiple;
}
else{
//It already is right
result = requestedAmout;
}
For the BatchSize of 4, you will get:
13; 13%4=1; 4-1=3; 13+3=16;
14; 14%4=2; 4-2=2; 14+2=16;
15; 15%4=3; 4-3=1; 15+1=16;
16; 16%4=0; Else is used. 16 is already right.

Using an overflow for modulus with a signed integer

I'm implementing a bunch of different kinds of pseudo random number generators to play around with. I noticed that linear congruential generators can have periods the size of an int, and thought I could just use overflow instead of modulus and see if it's faster.
The only snag is that overflows overflow into the sign bit, and I need them all to be positive values.
EDIT: I was cloudy on a couple concepts, so I'm cleaning up this question so it makes more sense.
Basically it all boils down to me trying to lop off the sign bit of an integer. I've found that XORing the number with int.MinValue does the trick. But only when it has overflowed, if it hasn't that does the opposite. I'd like to avoid the extra if statement though.
If someone could show me some nifty trick to snag the first 31 bits and stuff them into an integer, that would be delightful. Or some way to just set the sign bit to zero would probably be better?

If you want overflow to start back at zero, you should just mask off the sign bit.
unchecked {
int x = int.MaxValue + 5;
int y = x & 0x7fffffff;
}
Console.WriteLine(y);
This outputs the number 4.
I don't think absolute value of the overflowed value will give you what you want (you would go up to maxint, and then descend back down, plus, you'll have to specially handle int.MaxValue + 1 because it equals int.MinValue, which Math.Abs() will throw an exception on).
unchecked {
int x = int.MaxValue + 5;
int y = Math.Abs(x);
}
Console.WriteLine(y);
This outputs the number 2147483644.

You mean like:
int x = -100;
int mask = (x >> 31);
Trace.WriteLine((x + mask) ^ mask);
output: 100

Calculate the unit in the last place (ULP) for doubles

Does .NET have a built-in method to calculate the ULP of a given double or float?
If not, what is the most efficient way to do so?

It seems the function is pretty trivial; this is based on the pseudocode in the accepted answer to the question linked by vulkanino:
double value = whatever;
long bits = BitConverter.DoubleToInt64Bits(value);
double nextValue = BitConverter.Int64BitsToDouble(bits + 1);
double result = nextValue - value;
For floats, you'd need to provide your own implementation of SingleToInt32Bits and Int32BitsToSingle, since BitConverter doesn't have those functions.
This page shows the special cases in the java implementation of the function; handling those should be fairly trivial, too.

phoog answer is good but has weaknesses with negative numbers, max_double, infinity and NaN.
phoog_ULP(positive x) --> a positive number. Good.
phoog_ULP(negative x) --> a negative number. I would expect positive number.
To fix this I recommend instead:
long bits = BitConverter.DoubleToInt64Bits(value) & 0x7FFFFFFFFFFFFFFFL;
Below are fringe cases that need resolution should you care...
phoog_ULP(x = +/- Max_double 1.797...e+308) returns an infinite result. (+1.996...e+292) expected.
phoog_ULP(x = +/- Infinity) results in a NaN. +Infinity expected.
phoog_ULP(x = +/- NaN) may unexpectedly change from a sNan to a qNaN. No change expected. One could argue either way on if the sign should become + in this case.
To solve these, I only see a short series of brutish if() tests to accommodate these, possible on the "bits" value for expediency. Example:
double ulpc(double value) {
long long bits = BitConverter::DoubleToInt64Bits(value);
if ((bits & 0x7FF0000000000000L) == 0x7FF0000000000000L) { // if x is not finite
if (bits & 0x000FFFFFFFFFFFFFL) { // if x is a NaN
return value; // I did not force the sign bit here with NaNs.
}
return BitConverter.Int64BitsToDouble(0x7FF0000000000000L); // Positive Infinity;
}
bits &= 0x7FFFFFFFFFFFFFFFL; // make positive
if (bits == 0x7FEFFFFFFFFFFFFFL) { // if x == max_double (notice the _E_)
return BitConverter.Int64BitsToDouble(bits) - BitConverter.Int64BitsToDouble(bits-1);
}
double nextValue = BitConverter.Int64BitsToDouble(bits + 1);
double result = nextValue - fabs(value);
return result;
}

Fractional Counting Via Integers

I receive an integer that represents a dollar amount in fractional denominations. I would like an algorithm that can add those numbers without parsing and converting them into doubles or decimals.
For example, I receive the integer 50155, which means 50 and 15.5/32 dollars. I then receive 10210 which is 10 and 21/32 dollars. So 50 15.5/32 + 10 21/32 = 61 4.5/32, thus:
50155 + 10210 = 61045
Again, I want to avoid this:
int a = 50155;
int b = a / 1000;
float c = a % 1000;
float d = b;
d += c / 320f;
// d = 50.484375
I would much prefer this:
int a = 50155;
int b = 10210;
int c = MyClass.Add(a.b); // c = 61045
...
public int Add(int a, int b)
{
// ?????
}
Thanks in advance for the help!

Well I don't think you need to use floating point...
public static int Add(int a, int b)
{
int firstWhole = a / 1000;
int secondWhole = b / 1000;
int firstFraction = a % 1000;
int secondFraction = b % 1000;
int totalFraction = firstFraction + secondFraction;
int totalWhole = firstWhole + secondWhole + (totalFraction / 320);
return totalWhole * 1000 + (totalFraction % 320);
}
Alternatively, you might want to create a custom struct that can convert to and from your integer format, and overloads the + operator. That would allow you to write more readable code which didn't accidentally lead to other integers being treated as this slightly odd format.
EDIT: If you're forced to stick with a "single integer" format but get to adjust it somewhat you may want to consider using 512 instead of 1000. That way you can use simple mask and shift:
public static int Add(int a, int b)
{
int firstWhole = a >> 9;
int secondWhole = b >> 9;
int firstFraction = a & 0x1ff
int secondFraction = b & 0x1ff;
int totalFraction = firstFraction + secondFraction;
int totalWhole = firstWhole + secondWhole + (totalFraction / 320);
return (totalWhole << 9) + (totalFraction % 320);
}
There's still the messing around with 320, but it's at least somewhat better.

Break the string up in the part that represents whole dollars, and the part that represents fractions of dollars. For the latter, instead of treating it as 10.5 thirty-seconds of a dollar, it's probably easier to treat it as 105 three hundred and twentieths of a dollar (i.e. multiply both by ten to the numerator is always an integer).
From there, doing math is fairly simple (if somewhat tedious to write): add the fractions. If that exceeds a whole dollar, carry a dollar (and subtract 320 from the fraction part). Then add the whole dollars. Subtraction likewise -- though in this case you need to take borrowing into account instead of carrying.

Edit:
This answer suggests that one "stays away" from float arithmetic. Surprisingly, the OP indicated that his float-based logic (not shown for proprietary reasons) was twice as fast as the integer-modulo solution below! Comes to show that FPUs are not that bad after all...
Definitively, stay away from floats (for this particular problem). Integer arithmetic is both more efficient and doesn't introduce rounding error issues.
Something like the following should do the trick
Note: As written, assumes A and B are positive.
int AddMyOddlyEncodedDollars (int A, int B) {
int sum;
sum = A + B
if (sum % 1000 < 320);
return sum
else
return sum + 1000 - 320;
}
Edit: On the efficiency of the modulo operator in C
I depends very much on the compiler... Since the modulo value is known at compile time, I'd expect most modern compilers to go the "multiply [by reciprocal] and shift" approach, and this is fast.
This concern about performance (with this rather contrived format) is a calling for premature optimization, but then again, I've seen software in the financial industry mightily optimized (to put it politely), and justifiably so.

As a point for learning, this representation is called "fixed point". There are a number of implementations that you can look at. I would strongly suggest that you do NOT use int as your top level data type, but instead create a type called Fixed that encapsulates the operations. It will keep your bug count down when you mistakenly add a plain int to a fixed point number without scaling it first, or scale a number and forget to unscale it.

Looks like a strange encoding to me.
Anyway, if the format is in 10-base Nxxx where N is an integer denoting whole dollars and xxx is interpreted as
(xxx / 320)
and you want to add them together, the only thing you need to handle is to do carry when xxx exceeds 320:
int a = ..., b = ...; // dollar amounts
int c = (a + b); // add together
// Calculate carry
int carry = (c % 1000) / 320; // integer division
c += carry * 1000;
c -= carry * 320;
// done
Note: this works because if a and b are encoded correctly, the fractional parts add together to 638 at most and thus there is no "overflow" to the whole dollars part.

BEWARE: This post is wrong, wrong, wrong. I will remove it as soon as I stop feeling a fool for trying it.
Here is my go: You can trade space for time.
Construct a mapping for the first 10 bits to a tuple: count of dollars, count of piecesof32.
Then use bit manipulation on your integer:
ignore bits 11 and above, apply map.
shift the whole number 10 times, add small change dollars from mapping above
you now have the dollar amoung and the piecesof32 amount
add both
move overflow to dollar amount
Next, to convert back to "canonical" notation, you need a reverse lookup map for your piecesof32 and "borrow" dollars to fill up the bits. Unshift the dollars 10 times and add the piecesof32.
EDIT: I should remove this, but I am too ashamed. Of course, it cannot work. I'm so stupid :(
The reason being: shifting by 10 to the right is the same as dividing by 1024 - it's not as if some of the lower bits have a dollar amount and some a piecesof32 amount. Decimal and binary notation just don't split up nicely. Thats why we use hexadecimal notation (grouping of 4 bits). Bummer.

If you insist on working in ints you can't solve your problem without parsing -- after all your data is not integer. I call into evidence the (so far) 3 answers which all parse your ints into their components before performing arithmetic.
An alternative would be to use rational numbers with 2 (integer) components, one for the whole part, and one for the number of 320ths in the fractional part. Then implement the appropriate rational arithmetic. As ever, choose your representations of data carefully and your algorithms become much easier to implement.
I can't say that I think this alternative is particularly better on any axis of comparison but it might satisfy your urge not to parse.

Division to the nearest 1 decimal place without floating point math?

I am having some speed issues with my C# program and identified that this percentage calculation is causing a slow down. The calculation is simply n/d * 100. Both the numerator and denominator can be any integer number. The numerator can never be greater than the denominator and is never negative. Therefore, the result is always from 0-100. Right now, this is done by simply using floating point math and is somewhat slow, since it's being calculated tens of millions of times. I really don't need anything more accurate than to the nearest 0.1 percent. And, I just use this calculated value to see if it's bigger than a fixed constant value. I am thinking that everything should be kept as an integer, so the range with 0.1 accuracy would be 0-1000. Is there some way to calculate this percentage without floating point math?
Here is the loop that I am using with calculation:
for (int i = 0; i < simulationList.Count; i++)
{
for (int j = i + 1; j < simulationList.Count; j++)
{
int matches = GetMatchCount(simulationList[i], simulationList[j]);
if ((float)matches / (float)simulationList[j].Catchments.Count > thresPercent)
{
simulationList[j].IsOverThreshold = true;
}
}
}

Instead of n/d > c, you can use n > d * c (supposing that d > 0).
(c is the constant value you are comparing to.)
This way you don't need division at all.
However, watch out for the overflows.

If your units are in tenths instead of ones, then you can get your 0.1 accuracy using integer arithmetic:
Instead of:
for (...)
{
float n = ...;
float d = ...;
if (n / d > 1.4) // greater than 140% ?
...do something like:
for (...)
{
int n = 10 * ...;
int d = ...;
if (n / d > 14) // greater than 140% ?

Instead of writing
if ((float)matches / (float)simulationList[j].Catchments.Count > thresPercent)
write this:
if (matches * theresPercent_Denominator > simulationList[j].Catchments.Count * thresPercent_Numerator)
In this way, you get rid of the floating points.
Note: thresPercent can be expressed as thresPercent_Numerator / theresPercent_Denominator, as long as the number is a rational number.) I think this is the optimal way on PC. For some other platform, you may further optimize it by left-shift or right-shift, if theresPercent_Denominator and/or thresPercent_Numerator are 2's power. (Normally left-shift is enough, but may need use right-shift by rearrange the equation to division, to prevent from overflow)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to make my extended range floating point multiply more efficient? - c#

If you're sure there won't be an overflow, you can use an unchecked block. That will remove the overflow checks, and give you a bit more performance.

Related

Algorithm to round two numbers to the nearest evenly divisible ones

Using an overflow for modulus with a signed integer

Calculate the unit in the last place (ULP) for doubles

Fractional Counting Via Integers

Division to the nearest 1 decimal place without floating point math?

Categories

Resources