How to multiply a BigInteger with a decimal in C#?
var bi = new BigInteger(1000);
var d = 0.9m;
// HowTo:
var res = BigInteger.Multiply(bi, d); // res = 900
Of course, the result should be floored down to the previous full integer value.
There is a practical background, but in regards to the posted "duplicate", I'm also interested into an answer to the question from a theoretical point of view. I'm not looking for a workaround by using double instead.
You can represent decimal as fraction
(BigInteger numerator, BigInteger denominator) Fraction(decimal d) {
int[] bits = decimal.GetBits(d);
BigInteger numerator = (1 - ((bits[3] >> 30) & 2)) *
unchecked(((BigInteger)(uint)bits[2] << 64) |
((BigInteger)(uint)bits[1] << 32) |
(BigInteger)(uint)bits[0]);
BigInteger denominator = BigInteger.Pow(10, (bits[3] >> 16) & 0xff);
return (numerator, denominator);
}
Then you can multiply BigInteger to numerator and divide by denominator:
var bi = new BigInteger(1000);
var d = 0.9m;
var f = Fraction(d);
var res = bi * f.numerator / f.denominator;
Related
I've written an IEEE 754 "quarter" 8-bit minifloat in a 1.3.4.ā3 format in C#.
It was mostly a fun little side-project, testing whether or not I understand floats.
Actually, though, I find myself using it more than I'd like to admit :) (bandwidth > clock ticks)
Here's my code for converting the minifloat to a 32-bit float:
public static implicit operator float(quarter q)
{
int sign = (q.value & 0b1000_0000) << 24;
int fusedExponentMantissa = (q.value & 0b0111_1111) << (23 - MANTISSA_BITS);
if ((q.value & 0b0111_0000) == 0b0111_0000) // NaN/Infinity
{
return asfloat(sign | (255 << 23) | fusedExponentMantissa);
}
else // normal and subnormal
{
float magic = asfloat((255 - 1 + EXPONENT_BIAS) << 23);
return magic * asfloat(sign | fusedExponentMantissa);
}
}
where quarter.value is the stored byte and "asfloat" is simply *(float*)&myUInt.The "magic" number makes use of mantissa overflow in the subnormal case, which affects the f_32 exponent (integer multiplication and mask + add is slower than FPU-switch and float multiplication). I guess one could optimize away the branch, too.
But here comes the problematic code - float_32 to float_8:
public static explicit operator quarter(float f)
{
byte f8_sign = (byte)((asuint(f) & 0x8000_0000u) >> 24);
uint f32_exponent = asuint(f) & 0x7F80_0000u;
uint f32_mantissa = asuint(f) & 0x007F_FFFFu;
if (f32_exponent < (120 << 23)) // underflow => preserve +/- 0
{
return new quarter { value = f8_sign };
}
else if (f32_exponent > (130 << 23)) // overflow => +/- infinity or preserve NaN
{
return new quarter { value = (byte)(f8_sign | PositiveInfinity.value | touint8(isnan(f))) };
}
else
{
switch (f32_exponent)
{
case 120 << 23: // 2^(-7) * 1.(mantissa > 0) means the value is closer to quarter.epsilon than 0
{
return new quarter { value = (byte)(f8_sign | touint8(f32_mantissa != 0)) };
}
case 121 << 23: // 2^(-6) * (1 + mantissa): return +/- quarter.epsilon = 2^(-2) * (0 + 2^(-4)); if the mantissa is > 0.5 i.e. 2^(-6) * max(mantissa, 1.75), return 2^(-2) * 2^(-3)
{
return new quarter { value = (byte)(f8_sign | (Epsilon.value + touint8(f32_mantissa > 0x0040_0000))) };
}
case 122 << 23:
{
return new quarter { value = (byte)(f8_sign | 0b0000_0010u | (f32_mantissa >> 22)) };
}
case 123 << 23:
{
return new quarter { value = (byte)(f8_sign | 0b0000_0100u | (f32_mantissa >> 21)) };
}
case 124 << 23:
{
return new quarter { value = (byte)(f8_sign | 0b0000_1000u | (f32_mantissa >> 20)) };
}
default:
{
const uint exponentDelta = (127 + EXPONENT_BIAS) << 23;
return new quarter { value = (byte)(f8_sign | (((f32_exponent - exponentDelta) | f32_mantissa) >> 19)) };
}
}
}
}
... where the function
"asuint" is simply *(uint*)&myFloat and
"touint8" is simply *(byte*)&myBoolean i.e. myBoolean ? 1 : 0.
The first five cases deal with numbers that can only be represented as subnormals in a "quarter".
I want to get rid of the switch at the very least. There's obviously a pattern (same as with float8_to_float32) but I haven't been able to figure out how I could unify the entire switch for days... I tried to google how hardware converts doubles to floats but that yielded no results either.
My requirements are to hold on to the IEEE-754 standard, meaning:
NaN, infinity preservation and clamping to infinity/zero in case of over-/underflow, aswell as rounding to epsilon when the larger type's value is closer to epsilon than 0 (first switch case aswell as the underflow limit in the first if statement).
Can anyone at least push me in the right direction please?
This may not be optimal, but it uses strictly conforming C code except as noted in the first comment, so no pointer aliasing or other manipulation of the bits of a floating-point object. A thorough test program is included.
#include <inttypes.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
/* Notes on portability:
uint8_t is an optional type. Its use here is easily replaced by
unsigned char.
Round-to-nearest is required in FloatToMini.
Floating-point must be base two, and the constant in the
Dekker-Veltkamp split is hardcoded for IEEE-754 binary64 but could be
adopted to other formats. (Change the exponent in 0x1p48 to the number
of bits in the significand minus five.)
*/
/* Convert a double to a 1-3-4 floating-point format. Round-to-nearest is
required.
*/
static uint8_t FloatToMini(double x)
{
// Extract the sign bit of x, moved into its position in a mini-float.
uint8_t s = !!signbit(x) << 7;
x = fabs(x);
/* If x is a NaN, return a quiet NaN with the copied sign. Significand
bits are not preserved.
*/
if (x != x)
return s | 0x78;
/* If |x| is greater than or equal to the rounding point between the
maximum finite value and infinity, return infinity with the copied sign.
(0x1.fp0 is the largest representable significand, 0x1.f8 is that plus
half an ULP, and the largest exponent is 3, so 0x1.f8p3 is that
rounding point.)
*/
if (0x1.f8p3 <= x)
return s | 0x70;
// If x is subnormal, encode with zero exponent.
if (x < 0x1p-2 - 0x1p-7)
return s | (uint8_t) nearbyint(x * 0x1p6);
/* Round to five significand bits using the Dekker-Veltkamp Split. (The
cast eliminates the excess precision that the C standard allows.)
*/
double d = x * (0x1p48 + 1);
x = d - (double) (d-x);
/* Separate the significand and exponent. C's frexp scales the exponent
so the significand is in [.5, 1), hence the e-1 below.
*/
int e;
x = frexp(x, &e) - .5;
return s | (e-1+3) << 4 | (uint8_t) (x*0x1p5);
}
static void Show(double x)
{
printf("%g -> 0x%02" PRIx8 ".\n", x, FloatToMini(x));
}
static void Test(double x, uint8_t expected)
{
uint8_t observed = FloatToMini(x);
if (expected != observed)
{
printf("Error, %.9g (%a) produced 0x%02" PRIx8
" but expected 0x%02" PRIx8 ".\n",
x, x, observed, expected);
exit(EXIT_FAILURE);
}
}
int main(void)
{
// Set the value of an ULP in [1, 2).
static const double ULP = 0x1p-4;
// Test all even significands with normal exponents.
for (double s = 1; s < 2; s += 2*ULP)
// Test with trailing bits less than or equal to 1/2 ULP in magnitude.
for (double t = -ULP / (s == 1 ? 4 : 2); t <= +ULP/2; t += ULP/16)
// Test with all normal exponents.
for (int e = 1-3; e < 7-3; ++e)
// Test with both signs.
for (int sign = -1; sign <= +1; sign += 2)
{
// Prepare the expected encoding.
uint8_t expected =
(0 < sign ? 0 : 1) << 7
| (e+3) << 4
| (uint8_t) ((s-1) * 0x1p4);
Test(sign * ldexp(s+t, e), expected);
}
// Test all odd significands with normal exponents.
for (double s = 1 + 1*ULP; s < 2; s += 2*ULP)
// Test with trailing bits less than or equal to 1/2 ULP in magnitude.
for (double t = -ULP/2+ULP/16; t < +ULP/2; t += ULP/16)
// Test with all normal exponents.
for (int e = 1-3; e < 7-3; ++e)
// Test with both signs.
for (int sign = -1; sign <= +1; sign += 2)
{
// Prepare the expected encoding.
uint8_t expected =
(0 < sign ? 0 : 1) << 7
| (e+3) << 4
| (uint8_t) ((s-1) * 0x1p4);
Test(sign * ldexp(s+t, e), expected);
}
// Set the value of an ULP in the subnormal range.
static const double subULP = ULP * 0x1p-2;
// Test all even significands with the subnormal exponent.
for (double s = 0; s < 0x1p-2; s += 2*subULP)
// Test with trailing bits less than or equal to 1/2 ULP in magnitude.
for (double t = s == 0 ? 0 : -subULP/2; t <= +subULP/2; t += subULP/16)
{
// Test with both signs.
for (int sign = -1; sign <= +1; sign += 2)
{
// Prepare the expected encoding.
uint8_t expected =
(0 < sign ? 0 : 1) << 7
| (uint8_t) (s/subULP);
Test(sign * (s+t), expected);
}
}
// Test all odd significands with the subnormal exponent.
for (double s = 0 + 1*subULP; s < 0x1p-2; s += 2*subULP)
// Test with trailing bits less than or equal to 1/2 ULP in magnitude.
for (double t = -subULP/2 + subULP/16; t < +subULP/2; t += subULP/16)
{
// Test with both signs.
for (int sign = -1; sign <= +1; sign += 2)
{
// Prepare the expected encoding.
uint8_t expected =
(0 < sign ? 0 : 1) << 7
| (uint8_t) (s/subULP);
Test(sign * (s+t), expected);
}
}
// Test at and slightly under the point of rounding to infinity.
Test(+15.75, 0x70);
Test(-15.75, 0xf0);
Test(nexttoward(+15.75, 0), 0x6f);
Test(nexttoward(-15.75, 0), 0xef);
// Test infinities and NaNs.
Test(+INFINITY, 0x70);
Test(-INFINITY, 0xf0);
Test(+NAN, 0x78);
Test(-NAN, 0xf8);
Show(0);
Show(0x1p-6);
Show(0x1p-2);
Show(0x1.1p-2);
Show(0x1.2p-2);
Show(0x1.4p-2);
Show(0x1.8p-2);
Show(0x1p-1);
Show(15.5);
Show(15.75);
Show(16);
Show(NAN);
Show(1./6);
Show(1./3);
Show(2./3);
}
I hate to answer my own question... But this may still not be the optimal solution.
Although #Eric Postpischil's solution uses an established algorithm, it is not very well suited for minifloats, since there are so few denormals in 4 mantissa bits. Additionally, the overhead of multiple float arithmetic operations - and because of the actual code behind frexp in particular, it only has one branch less (or two when inlined and optimized) than my original solution and is also not that great in regards to instruction level parallelism.
So here's my current solution:
public static explicit operator quarter(float f)
{
byte f8_sign = (byte)((asuint(f) >> 31) << 7);
uint f32_exponent = (asuint(f) >> 23) & 0x00FFu;
uint f32_mantissa = asuint(f) & 0x007F_FFFFu;
if (f32_exponent < 120) // underflow => preserve +/- 0
{
return new quarter { value = f8_sign };
}
else if (f32_exponent > 130) // overflow => +/- infinity or preserve NaN
{
return new quarter { value = (byte)(f8_sign | PositiveInfinity.value | touint8(isnan(f))) };
}
else
{
int cmp = 125 - (int)f32_exponent;
int cmpIsZeroOrNegativeMask = (cmp - 1) >> 31;
int denormalExponent = andnot(0b0001_0000 >> cmp, cmpIsZeroOrNegativeMask); // special case 121: sets it to quarter.Epsilon
denormalExponent += touint8((f32_exponent == 121) & (f32_mantissa >= 0x0040_0000)); // case 121: 2^(-6) * (1 + mantissa): return +/- quarter.Epsilon = 2^(-2) * 2^(-4); if the mantissa is >= 0.5 return 2^(-2) * 2^(-3)
denormalExponent |= touint8((f32_exponent == 120) & (f32_mantissa != 0)); // case 120: 2^(-7) * 1.(mantissa > 0) means the value is closer to quarter.epsilon than 0
int normalExponent = (cmpIsZeroOrNegativeMask & ((int)f32_exponent - (127 + EXPONENT_BIAS))) << 4;
int mantissaShift = 19 + andnot(cmp, cmpIsZeroOrNegativeMask);
return new quarter { value = (byte)((f8_sign | normalExponent) | (denormalExponent | (f32_mantissa >> mantissaShift))) };
}
}
But note that the particular andnot(int a, int b) function I use returns a & ~b and...not ~a & b.
Thanks for your help :) I'm keeping this open since, as mentioned, this may very well not be the best solution - but at least it's my own...
PS: This is probably a good example for why PREMATURE optimization is bad; Your code is much less readable. Make sure you have the functionality backed up by unit tests and make sure you even need the optimization in the first place.
...And after some time and in the spirit of transparent progression, I want to show the final version, since I believe to have found the optimal implementation; more later.
First off, here it is (the code should speak for itself, which is why it is this "much"):
unsafe struct quarter
{
const bool IEEE_754_STANDARD = true; //standard: true
const bool SIGN_BIT = IEEE_754_STANDARD || true; //standard: true
const int BITS = 8 * sizeof(byte); //standard: 8
const int EXPONENT_BITS = 3 + (SIGN_BIT ? 0 : 1); //standard: 3
const int MANTISSA_BITS = BITS - EXPONENT_BITS - (SIGN_BIT ? 1 : 0); //standard: 4
const int EXPONENT_BIAS = -(((1 << BITS) - 1) >> (BITS - (EXPONENT_BITS - 1))); //standard: -3
const int MAX_EXPONENT = EXPONENT_BIAS + ((1 << EXPONENT_BITS) - 1) - (IEEE_754_STANDARD ? 1 : 0); //standard: 3
const int SIGNALING_EXPONENT = (MAX_EXPONENT - EXPONENT_BIAS + (IEEE_754_STANDARD ? 1 : 0)) << MANTISSA_BITS; //standard: 0b0111_0000
const int F32_BITS = 8 * sizeof(float);
const int F32_EXPONENT_BITS = 8;
const int F32_MANTISSA_BITS = 23;
const int F32_EXPONENT_BIAS = -(int)(((1L << F32_BITS) - 1) >> (F32_BITS - (F32_EXPONENT_BITS - 1)));
const int F32_MAX_EXPONENT = F32_EXPONENT_BIAS + ((1 << F32_EXPONENT_BITS) - 1 - 1);
const int F32_SIGNALING_EXPONENT = (F32_MAX_EXPONENT - F32_EXPONENT_BIAS + 1) << F32_MANTISSA_BITS;
const int F32_SHL_LOSE_SIGN = (F32_BITS - (MANTISSA_BITS + EXPONENT_BITS));
const int F32_SHR_PLACE_MANTISSA = MANTISSA_BITS + ((1 + F32_EXPONENT_BITS) - (MANTISSA_BITS + EXPONENT_BITS));
const int F32_MAGIC = (((1 << F32_EXPONENT_BITS) - 1) - (1 + EXPONENT_BITS)) << F32_MANTISSA_BITS;
byte _value;
static quarter Epsilon => new quarter { _value = 1 };
static quarter MaxValue => new quarter { _value = (byte)(SIGNALING_EXPONENT - 1) };
static quarter NaN => new quarter { _value = (byte)(SIGNALING_EXPONENT | 1) };
static quarter PositiveInfinity => new quarter { _value = (byte)SIGNALING_EXPONENT };
static uint asuint(float f) => *(uint*)&f;
static float asfloat(uint u) => *(float*)&u;
static byte tobyte(bool b) => *(byte*)&b;
static float ToFloat(quarter q, bool promiseInRange)
{
uint fusedExponentMantissa = ((uint)q._value << F32_SHL_LOSE_SIGN) >> F32_SHR_PLACE_MANTISSA;
uint sign = ((uint)q._value >> (BITS - 1)) << (F32_BITS - 1);
if (!promiseInRange)
{
bool nanInf = (q._value & SIGNALING_EXPONENT) == SIGNALING_EXPONENT;
uint ifNanInf = asuint(float.PositiveInfinity) & (uint)(-tobyte(nanInf));
return (nanInf ? 1f : asfloat(F32_MAGIC)) * asfloat(sign | fusedExponentMantissa | ifNanInf);
}
else
{
return asfloat(F32_MAGIC) * asfloat(sign | fusedExponentMantissa);
}
}
static quarter ToQuarter(float f, bool promiseInRange)
{
float inRange = f * (1f / asfloat(F32_MAGIC));
uint q = asuint(inRange) >> (F32_MANTISSA_BITS - (1 + EXPONENT_BITS));
uint f8_sign = asuint(f) >> (F32_BITS - 1);
if (!promiseInRange)
{
uint f32_exponent = asuint(f) & F32_SIGNALING_EXPONENT;
bool overflow = f32_exponent > (uint)(-F32_EXPONENT_BIAS + MAX_EXPONENT << F32_MANTISSA_BITS);
bool notNaNInf = f32_exponent != F32_SIGNALING_EXPONENT;
f8_sign ^= tobyte(!notNaNInf);
if (overflow & notNaNInf)
{
q = PositiveInfinity._value;
}
}
f8_sign <<= (BITS - 1);
return new quarter{ _value = (byte)(q ^ f8_sign) };
}
}
Turns out that in fact, the reverse operation of converting the mini-float to a 32 bit float by multiplying with a magic constant is also the reverse operation of a multiplication (wow...): a floating point division by that constant.
Luckily "by that constant" and not the other way around; we can calculate the reciprocal at compile time and multiply by it instead. This only fails, as with the reverse operation, when converting to- and from 'INF' and 'NaN'. Absolute overflow with any biased 32 exponent with exponent % (MAX_EXPONENT + 1) != 0 is not translated into 'INF' and positive 'INF' is translated into negative 'INF'.
Although this enables some optimizations through the bool paramater, this mostly just reduces code size and more importantly (especially for SIMD versions, where small data types really shine) reduces the need for constants. Speaking of SIMD: This scalar version can be optimized a little by using SSE/SSE2 intrinsics.
The (disabled) optimizations (would) run completely in parallel to the floating point multiplication followed by a shift, taking a total of 5 to 6+ clock cycles (very CPU dependant), which is astonishingly close to native hardware instructions (~4 to 5 clock cycles).
I am trying to parse C# decimal type to my own Decimal type which has significant and exponent only. I tried this:
public static MyType.Decimal FromDecimal(decimal decimalValue)
{
decimal tempValue = decimalValue;
int exponent = 0;
while ((long) (tempValue) < decimalValue)
{
tempValue *= 10;
decimalValue *= 10;
exponent--;
}
return new MyType.Decimal()
{
Significand = (long)tempValue,
Exponent = exponent
};
}
public class Decimal
{
public long Significand;
public int Exponent;
}
but in my opinion, it will be slow.
You can try decimal.GetBits(), e.g.
public static MyType.Decimal FromDecimal(decimal decimalValue) {
int[] parts = decimal.GetBits(decimalValue);
int sign = 1 - 2 * (parts[3] >> 31 & 1);
long mantissa = ((((long)parts[1]) << 32) | (((long)parts[0]))) * sign;
int exponent = -((parts[3] >> 16) & 0xFF);
return MyType.Decimal() {
Significand = mantissa,
Exponent = exponent
};
}
But be careful: decimal uses 96-bit mantissa (please, note, abscence of parts[2] which we ignored), that's why long Significand is too short (long is 64-bit only)
Demo:
decimal[] tests = new decimal[] {
0m,
10m,
100m,
0.01m,
123e-1m,
-3m,
123456789e4m,
123456789e-4m,
1234567890123456m,
1234567890123456e4m // 12345678901234560000 > long.MaxValue (9223372036854775807)
};
string report = string.Join(Environment.NewLine, tests
.Select(test => {
int[] parts = decimal.GetBits(test);
int sign = 1 - 2 * (parts[3] >> 31 & 1);
long mantissa = ((((long)parts[1]) << 32) | (((long)parts[0]))) * sign;
int exponent = -((parts[3] >> 16) & 0xFF);
return $"{test,20} == {mantissa}e{exponent}";
}));
Outcome:
0 == 0e0
10 == 10e0
100 == 100e0
0.01 == 1e-2
12.3 == 123e-1
-3 == -3e0
1234567890000 == 1234567890000e0
12345.6789 == 123456789e-4
1234567890123456 == 1234567890123456e0
12345678901234560000 == -350295040e0 -- mantissa is 96 bit, doesn't fit 64-bit long
I am trying to use AForge.Net Genetics library to create a simple application for optimization purposes. I have a scenario where I have four input parameters, therefore I tried to modify the "OptimizationFunction2D.cs" class located in the AForge.Genetic project to handle four parameters.
While converting the binary chromosomes into 4 parameters (type = double) I am not sure if my approach is correct as I don't know how to verify the extracted values. Below is the code segment where my code differs from the original AForge code:
public double[] Translate( IChromosome chromosome )
{
// get chromosome's value
ulong val = ((BinaryChromosome) chromosome).Value;
// chromosome's length
int length = ((BinaryChromosome) chromosome).Length;
// length of W component
int wLength = length/4;
// length of X component
int xLength = length / 4;
// length of Y component
int yLength = length / 4;
// length of Z component
int zLength = length / 4;
// W maximum value - equal to X mask
ulong wMax = 0xFFFFFFFFFFFFFFFF >> (64 - wLength);
// X maximum value
ulong xMax = 0xFFFFFFFFFFFFFFFF >> (64 - xLength);
// Y maximum value - equal to X mask
ulong yMax = 0xFFFFFFFFFFFFFFFF >> (64 - yLength);
// Z maximum value
ulong zMax = 0xFFFFFFFFFFFFFFFF >> (64 - zLength);
// W component
double wPart = val & wMax;
// X component;
double xPart = (val >> wLength) & xMax;
// Y component;
double yPart = (val >> (wLength + xLength) & yMax);
// Z component;
double zPart = val >> (wLength + xLength + yLength);
// translate to optimization's function space
double[] ret = new double[4];
ret[0] = wPart * _rangeW.Length / wMax + _rangeW.Min;
ret[1] = xPart * _rangeX.Length / xMax + _rangeX.Min;
ret[2] = yPart * _rangeY.Length / yMax + _rangeY.Min;
ret[3] = zPart * _rangeZ.Length / zMax + _rangeZ.Min;
return ret;
}
I am not sure if am correctly separating the chromosome value into four part (wPart/xPart/yPart/zPart). The original function in the AForge.Genetic library looks like this:
public double[] Translate( IChromosome chromosome )
{
// get chromosome's value
ulong val = ( (BinaryChromosome) chromosome ).Value;
// chromosome's length
int length = ( (BinaryChromosome) chromosome ).Length;
// length of X component
int xLength = length / 2;
// length of Y component
int yLength = length - xLength;
// X maximum value - equal to X mask
ulong xMax = 0xFFFFFFFFFFFFFFFF >> ( 64 - xLength );
// Y maximum value
ulong yMax = 0xFFFFFFFFFFFFFFFF >> ( 64 - yLength );
// X component
double xPart = val & xMax;
// Y component;
double yPart = val >> xLength;
// translate to optimization's function space
double[] ret = new double[2];
ret[0] = xPart * rangeX.Length / xMax + rangeX.Min;
ret[1] = yPart * rangeY.Length / yMax + rangeY.Min;
return ret;
}
Can someone please confirm if my conversion process is correct or is there a better way of doing it.
No, it works but you don't need it to be so complicated.
ulong wMax = 0xFFFFFFFFFFFFFFFF >> (64 - wLength);
this returns the same value to all the results wMax xMax yMax zMax so just do one and call it componentMask
part = (val >> (wLength * pos) & componentMask);
where pos is the 0 based position of the component. so 0 for w, 1 for x ...
the rest is ok.
EDIT:
if the Length is not divided by 4 you can make the last part be just val >> (wLength * pos) to make it have the remaining bits.
I was wondering if it would be possible to get a vector with an X and a Y value as a single number, knowing that both X and Y can range from -65000 to +65000.
Is this possible in any way?
Code examples on how to convert from this kind of number and to it would be nice.
Store it in a ulong:
ulong rslt = (uint)x;
rslt = rslt << 32;
rslt |= ((uint)y);
To get it out:
int x = (int)(rslt >> 32);
int y = (int)(rslt & 0xFFFFFFFF);
Assuming X and Y are both integer values and there is no overflow (32bit values is not enough) you can use e.g. (pseudocode)
V = fromXY(X, Y) = (y+65000)*130001+(x+65000)
(X,Y) = toXY(V) = (V%130001-65000,V/130001-65000) // <= / is integer division
(130001 is the number of distinct values for X or Y)
To combine:
var limit = 65000;
var x = 1;
var y = 2;
var single = x * (limit + 1) + y;
And then:
y = single % (limit + 1);
x = single - y / (limit + 1);
See it in action.
Of course, you have to assume that the maximum value for single fits within the size of the data type that stores it (which in this case it does).
the union does what you want very easily.
See also: http://www.cplusplus.com/doc/tutorial/other_data_types/
typedef long int64;
typedef int int32;
union {
struct { int32 a, b; };
int64 a_and_b;
} stacker;
int main ()
{
stacker.a = -1000;
stacker.b = 2000;
cout << stacker.a << ", " << stacker.b << endl;
cout << stacker.a_and_b << endl;
}
this will output:
-1000, 2000 <-- a and b read as two int32
8594229558296 <-- a and b interprested as a single int64
I would like to use RNGCryptoServiceProvider as my source of random numbers. As it only can output them as an array of byte values how can I convert them to 0 to 1 double value while preserving uniformity of results?
byte[] result = new byte[8];
rng.GetBytes(result);
return (double)BitConverter.ToUInt64(result,0) / ulong.MaxValue;
This is how I would do this.
private static readonly System.Security.Cryptography.RNGCryptoServiceProvider _secureRng;
public static double NextSecureDouble()
{
var bytes = new byte[8];
_secureRng.GetBytes(bytes);
var v = BitConverter.ToUInt64(bytes, 0);
// We only use the 53-bits of integer precision available in a IEEE 754 64-bit double.
// The result is a fraction,
// r = (0, 9007199254740991) / 9007199254740992 where 0 <= r && r < 1.
v &= ((1UL << 53) - 1);
var r = (double)v / (double)(1UL << 53);
return r;
}
Coincidentally 9007199254740991 / 9007199254740992 is ~= 0.99999999999999988897769753748436 which is what the Random.NextDouble method will return as it's maximum value (see https://msdn.microsoft.com/en-us/library/system.random.nextdouble(v=vs.110).aspx).
In general, the standard deviation of a continuous uniform distribution is (max - min) / sqrt(12).
With a sample size of 1000 I'm reliably getting within a 2% error margin.
With a sample size of 10000 I'm reliably getting within a 1% error margin.
Here's how I verified these results.
[Test]
public void Randomness_SecureDoubleTest()
{
RunTrials(1000, 0.02);
RunTrials(10000, 0.01);
}
private static void RunTrials(int sampleSize, double errorMargin)
{
var q = new Queue<double>();
while (q.Count < sampleSize)
{
q.Enqueue(Randomness.NextSecureDouble());
}
for (int k = 0; k < 1000; k++)
{
// rotate
q.Dequeue();
q.Enqueue(Randomness.NextSecureDouble());
var avg = q.Average();
// Dividing by nā1 gives a better estimate of the population standard
// deviation for the larger parent population than dividing by n,
// which gives a result which is correct for the sample only.
var actual = Math.Sqrt(q.Sum(x => (x - avg) * (x - avg)) / (q.Count - 1));
// see http://stats.stackexchange.com/a/1014/4576
var expected = (q.Max() - q.Min()) / Math.Sqrt(12);
Assert.AreEqual(expected, actual, errorMargin);
}
}
You can use the BitConverter.ToDouble(...) method. It takes in a byte array and will return a Double. Thre are corresponding methods for most of the other primitive types, as well as a method to go from the primitives to a byte array.
Use BitConverter to convert a sequence of random bytes to a Double:
byte[] random_bytes = new byte[8]; // BitConverter will expect an 8-byte array
new RNGCryptoServiceProvider().GetBytes(random_bytes);
double my_random_double = BitConverter.ToDouble(random_bytes, 0);