I have the code below to convert a 32 bit BCD value (supplied in two uint halves) to a uint binary value.
The values supplied can be up to 0x9999, to form a maximum value of 0x99999999.
Is there a better (ie. quicker) way to achieve this?
/// <summary>
/// Convert two PLC words in BCD format (forming 8 digit number) into single binary integer.
/// e.g. If Lower = 0x5678 and Upper = 0x1234, then Return is 12345678 decimal, or 0xbc614e.
/// </summary>
/// <param name="lower">Least significant 16 bits.</param>
/// <param name="upper">Most significant 16 bits.</param>
/// <returns>32 bit unsigned integer.</returns>
/// <remarks>If the parameters supplied are invalid, returns zero.</remarks>
private static uint BCD2ToBin(uint lower, uint upper)
{
uint binVal = 0;
if ((lower | upper) != 0)
{
int shift = 0;
uint multiplier = 1;
uint bcdVal = (upper << 16) | lower;
for (int i = 0; i < 8; i++)
{
uint digit = (bcdVal >> shift) & 0xf;
if (digit > 9)
{
binVal = 0;
break;
}
else
{
binVal += digit * multiplier;
shift += 4;
multiplier *= 10;
}
}
}
return binVal;
}
If you've space to spare for a 39,322 element array, you could always just look the value up.
If you unroll the loop, remember to keep the bit shift.
value = ( lo & 0xF);
value += ((lo >> 4 ) & 0xF) * 10;
value += ((lo >> 8 ) & 0xF) * 100;
value += ((lo >> 12) & 0xF) * 1000;
value += ( hi & 0xF) * 10000;
value += ((hi >> 4 ) & 0xF) * 100000;
value += ((hi >> 8 ) & 0xF) * 1000000;
value += ((hi >> 12) & 0xF) * 10000000;
Your code seems rather complicated; do you require the specific error checking?
Otherwise, you could just use the following code which shouldn't be slower, in fact, it's mostly the same:
uint result = 0;
uint multiplier = 1;
uint value = lo | hi << 0x10;
while (value > 0) {
uint digit = value & 0xF;
value >>= 4;
result += multiplier * digit;
multiplier *= 10;
}
return result;
I suppose you could unroll the loop:
value = ( lo & 0xF);
value+= ((lo>>4) & 0xF) *10;
value+= ((lo>>8) & 0xF) *100;
value+= ((lo>>12)& 0xF) *1000;
value+= ( hi & 0xF) *10000;
value+= ((hi>>4 & 0xF) *100000;
value+= ((hi>>8) & 0xF) *1000000;
value+= ((hi>>12)& 0xF) *10000000;
And you can check for invalid BCD digits like this:
invalid = lo & ((lo&0x8888)>>2)*3
This sets invalid to a non-zero value if any single hex digit > 9.
Try this:
public static int bcd2int(int bcd) {
return int.Parse(bcd.ToString("X"));
}
public static uint BCDToNum(int num)
{
return uint.Parse(num.ToString(), System.Globalization.NumberStyles.HexNumber);
}
Of course, there are a more efficient method. this is just a example of course, so you can tune it as a lesson ^^
function bcd_to_bin ($bcd) {
$mask_sbb = 0x33333333;
$mask_msb = 0x88888888;
$mask_opp = 0xF;
for($i=28;$i;--$i) {
$mask_msb <<= 1;
$mask_opp <<= 1;
$mask_sbb <<= 1;
for($j=0;$j<$i;$j+=4) {
$mask_opp_j = $mask_opp << $j;
if ($bcd & $mask_msb & $mask_opp_j ) {
$bcd -= $mask_sbb & $mask_opp_j;
}
}
}
return $bcd;
}
Related
I've written an IEEE 754 "quarter" 8-bit minifloat in a 1.3.4.−3 format in C#.
It was mostly a fun little side-project, testing whether or not I understand floats.
Actually, though, I find myself using it more than I'd like to admit :) (bandwidth > clock ticks)
Here's my code for converting the minifloat to a 32-bit float:
public static implicit operator float(quarter q)
{
int sign = (q.value & 0b1000_0000) << 24;
int fusedExponentMantissa = (q.value & 0b0111_1111) << (23 - MANTISSA_BITS);
if ((q.value & 0b0111_0000) == 0b0111_0000) // NaN/Infinity
{
return asfloat(sign | (255 << 23) | fusedExponentMantissa);
}
else // normal and subnormal
{
float magic = asfloat((255 - 1 + EXPONENT_BIAS) << 23);
return magic * asfloat(sign | fusedExponentMantissa);
}
}
where quarter.value is the stored byte and "asfloat" is simply *(float*)&myUInt.The "magic" number makes use of mantissa overflow in the subnormal case, which affects the f_32 exponent (integer multiplication and mask + add is slower than FPU-switch and float multiplication). I guess one could optimize away the branch, too.
But here comes the problematic code - float_32 to float_8:
public static explicit operator quarter(float f)
{
byte f8_sign = (byte)((asuint(f) & 0x8000_0000u) >> 24);
uint f32_exponent = asuint(f) & 0x7F80_0000u;
uint f32_mantissa = asuint(f) & 0x007F_FFFFu;
if (f32_exponent < (120 << 23)) // underflow => preserve +/- 0
{
return new quarter { value = f8_sign };
}
else if (f32_exponent > (130 << 23)) // overflow => +/- infinity or preserve NaN
{
return new quarter { value = (byte)(f8_sign | PositiveInfinity.value | touint8(isnan(f))) };
}
else
{
switch (f32_exponent)
{
case 120 << 23: // 2^(-7) * 1.(mantissa > 0) means the value is closer to quarter.epsilon than 0
{
return new quarter { value = (byte)(f8_sign | touint8(f32_mantissa != 0)) };
}
case 121 << 23: // 2^(-6) * (1 + mantissa): return +/- quarter.epsilon = 2^(-2) * (0 + 2^(-4)); if the mantissa is > 0.5 i.e. 2^(-6) * max(mantissa, 1.75), return 2^(-2) * 2^(-3)
{
return new quarter { value = (byte)(f8_sign | (Epsilon.value + touint8(f32_mantissa > 0x0040_0000))) };
}
case 122 << 23:
{
return new quarter { value = (byte)(f8_sign | 0b0000_0010u | (f32_mantissa >> 22)) };
}
case 123 << 23:
{
return new quarter { value = (byte)(f8_sign | 0b0000_0100u | (f32_mantissa >> 21)) };
}
case 124 << 23:
{
return new quarter { value = (byte)(f8_sign | 0b0000_1000u | (f32_mantissa >> 20)) };
}
default:
{
const uint exponentDelta = (127 + EXPONENT_BIAS) << 23;
return new quarter { value = (byte)(f8_sign | (((f32_exponent - exponentDelta) | f32_mantissa) >> 19)) };
}
}
}
}
... where the function
"asuint" is simply *(uint*)&myFloat and
"touint8" is simply *(byte*)&myBoolean i.e. myBoolean ? 1 : 0.
The first five cases deal with numbers that can only be represented as subnormals in a "quarter".
I want to get rid of the switch at the very least. There's obviously a pattern (same as with float8_to_float32) but I haven't been able to figure out how I could unify the entire switch for days... I tried to google how hardware converts doubles to floats but that yielded no results either.
My requirements are to hold on to the IEEE-754 standard, meaning:
NaN, infinity preservation and clamping to infinity/zero in case of over-/underflow, aswell as rounding to epsilon when the larger type's value is closer to epsilon than 0 (first switch case aswell as the underflow limit in the first if statement).
Can anyone at least push me in the right direction please?
This may not be optimal, but it uses strictly conforming C code except as noted in the first comment, so no pointer aliasing or other manipulation of the bits of a floating-point object. A thorough test program is included.
#include <inttypes.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
/* Notes on portability:
uint8_t is an optional type. Its use here is easily replaced by
unsigned char.
Round-to-nearest is required in FloatToMini.
Floating-point must be base two, and the constant in the
Dekker-Veltkamp split is hardcoded for IEEE-754 binary64 but could be
adopted to other formats. (Change the exponent in 0x1p48 to the number
of bits in the significand minus five.)
*/
/* Convert a double to a 1-3-4 floating-point format. Round-to-nearest is
required.
*/
static uint8_t FloatToMini(double x)
{
// Extract the sign bit of x, moved into its position in a mini-float.
uint8_t s = !!signbit(x) << 7;
x = fabs(x);
/* If x is a NaN, return a quiet NaN with the copied sign. Significand
bits are not preserved.
*/
if (x != x)
return s | 0x78;
/* If |x| is greater than or equal to the rounding point between the
maximum finite value and infinity, return infinity with the copied sign.
(0x1.fp0 is the largest representable significand, 0x1.f8 is that plus
half an ULP, and the largest exponent is 3, so 0x1.f8p3 is that
rounding point.)
*/
if (0x1.f8p3 <= x)
return s | 0x70;
// If x is subnormal, encode with zero exponent.
if (x < 0x1p-2 - 0x1p-7)
return s | (uint8_t) nearbyint(x * 0x1p6);
/* Round to five significand bits using the Dekker-Veltkamp Split. (The
cast eliminates the excess precision that the C standard allows.)
*/
double d = x * (0x1p48 + 1);
x = d - (double) (d-x);
/* Separate the significand and exponent. C's frexp scales the exponent
so the significand is in [.5, 1), hence the e-1 below.
*/
int e;
x = frexp(x, &e) - .5;
return s | (e-1+3) << 4 | (uint8_t) (x*0x1p5);
}
static void Show(double x)
{
printf("%g -> 0x%02" PRIx8 ".\n", x, FloatToMini(x));
}
static void Test(double x, uint8_t expected)
{
uint8_t observed = FloatToMini(x);
if (expected != observed)
{
printf("Error, %.9g (%a) produced 0x%02" PRIx8
" but expected 0x%02" PRIx8 ".\n",
x, x, observed, expected);
exit(EXIT_FAILURE);
}
}
int main(void)
{
// Set the value of an ULP in [1, 2).
static const double ULP = 0x1p-4;
// Test all even significands with normal exponents.
for (double s = 1; s < 2; s += 2*ULP)
// Test with trailing bits less than or equal to 1/2 ULP in magnitude.
for (double t = -ULP / (s == 1 ? 4 : 2); t <= +ULP/2; t += ULP/16)
// Test with all normal exponents.
for (int e = 1-3; e < 7-3; ++e)
// Test with both signs.
for (int sign = -1; sign <= +1; sign += 2)
{
// Prepare the expected encoding.
uint8_t expected =
(0 < sign ? 0 : 1) << 7
| (e+3) << 4
| (uint8_t) ((s-1) * 0x1p4);
Test(sign * ldexp(s+t, e), expected);
}
// Test all odd significands with normal exponents.
for (double s = 1 + 1*ULP; s < 2; s += 2*ULP)
// Test with trailing bits less than or equal to 1/2 ULP in magnitude.
for (double t = -ULP/2+ULP/16; t < +ULP/2; t += ULP/16)
// Test with all normal exponents.
for (int e = 1-3; e < 7-3; ++e)
// Test with both signs.
for (int sign = -1; sign <= +1; sign += 2)
{
// Prepare the expected encoding.
uint8_t expected =
(0 < sign ? 0 : 1) << 7
| (e+3) << 4
| (uint8_t) ((s-1) * 0x1p4);
Test(sign * ldexp(s+t, e), expected);
}
// Set the value of an ULP in the subnormal range.
static const double subULP = ULP * 0x1p-2;
// Test all even significands with the subnormal exponent.
for (double s = 0; s < 0x1p-2; s += 2*subULP)
// Test with trailing bits less than or equal to 1/2 ULP in magnitude.
for (double t = s == 0 ? 0 : -subULP/2; t <= +subULP/2; t += subULP/16)
{
// Test with both signs.
for (int sign = -1; sign <= +1; sign += 2)
{
// Prepare the expected encoding.
uint8_t expected =
(0 < sign ? 0 : 1) << 7
| (uint8_t) (s/subULP);
Test(sign * (s+t), expected);
}
}
// Test all odd significands with the subnormal exponent.
for (double s = 0 + 1*subULP; s < 0x1p-2; s += 2*subULP)
// Test with trailing bits less than or equal to 1/2 ULP in magnitude.
for (double t = -subULP/2 + subULP/16; t < +subULP/2; t += subULP/16)
{
// Test with both signs.
for (int sign = -1; sign <= +1; sign += 2)
{
// Prepare the expected encoding.
uint8_t expected =
(0 < sign ? 0 : 1) << 7
| (uint8_t) (s/subULP);
Test(sign * (s+t), expected);
}
}
// Test at and slightly under the point of rounding to infinity.
Test(+15.75, 0x70);
Test(-15.75, 0xf0);
Test(nexttoward(+15.75, 0), 0x6f);
Test(nexttoward(-15.75, 0), 0xef);
// Test infinities and NaNs.
Test(+INFINITY, 0x70);
Test(-INFINITY, 0xf0);
Test(+NAN, 0x78);
Test(-NAN, 0xf8);
Show(0);
Show(0x1p-6);
Show(0x1p-2);
Show(0x1.1p-2);
Show(0x1.2p-2);
Show(0x1.4p-2);
Show(0x1.8p-2);
Show(0x1p-1);
Show(15.5);
Show(15.75);
Show(16);
Show(NAN);
Show(1./6);
Show(1./3);
Show(2./3);
}
I hate to answer my own question... But this may still not be the optimal solution.
Although #Eric Postpischil's solution uses an established algorithm, it is not very well suited for minifloats, since there are so few denormals in 4 mantissa bits. Additionally, the overhead of multiple float arithmetic operations - and because of the actual code behind frexp in particular, it only has one branch less (or two when inlined and optimized) than my original solution and is also not that great in regards to instruction level parallelism.
So here's my current solution:
public static explicit operator quarter(float f)
{
byte f8_sign = (byte)((asuint(f) >> 31) << 7);
uint f32_exponent = (asuint(f) >> 23) & 0x00FFu;
uint f32_mantissa = asuint(f) & 0x007F_FFFFu;
if (f32_exponent < 120) // underflow => preserve +/- 0
{
return new quarter { value = f8_sign };
}
else if (f32_exponent > 130) // overflow => +/- infinity or preserve NaN
{
return new quarter { value = (byte)(f8_sign | PositiveInfinity.value | touint8(isnan(f))) };
}
else
{
int cmp = 125 - (int)f32_exponent;
int cmpIsZeroOrNegativeMask = (cmp - 1) >> 31;
int denormalExponent = andnot(0b0001_0000 >> cmp, cmpIsZeroOrNegativeMask); // special case 121: sets it to quarter.Epsilon
denormalExponent += touint8((f32_exponent == 121) & (f32_mantissa >= 0x0040_0000)); // case 121: 2^(-6) * (1 + mantissa): return +/- quarter.Epsilon = 2^(-2) * 2^(-4); if the mantissa is >= 0.5 return 2^(-2) * 2^(-3)
denormalExponent |= touint8((f32_exponent == 120) & (f32_mantissa != 0)); // case 120: 2^(-7) * 1.(mantissa > 0) means the value is closer to quarter.epsilon than 0
int normalExponent = (cmpIsZeroOrNegativeMask & ((int)f32_exponent - (127 + EXPONENT_BIAS))) << 4;
int mantissaShift = 19 + andnot(cmp, cmpIsZeroOrNegativeMask);
return new quarter { value = (byte)((f8_sign | normalExponent) | (denormalExponent | (f32_mantissa >> mantissaShift))) };
}
}
But note that the particular andnot(int a, int b) function I use returns a & ~b and...not ~a & b.
Thanks for your help :) I'm keeping this open since, as mentioned, this may very well not be the best solution - but at least it's my own...
PS: This is probably a good example for why PREMATURE optimization is bad; Your code is much less readable. Make sure you have the functionality backed up by unit tests and make sure you even need the optimization in the first place.
...And after some time and in the spirit of transparent progression, I want to show the final version, since I believe to have found the optimal implementation; more later.
First off, here it is (the code should speak for itself, which is why it is this "much"):
unsafe struct quarter
{
const bool IEEE_754_STANDARD = true; //standard: true
const bool SIGN_BIT = IEEE_754_STANDARD || true; //standard: true
const int BITS = 8 * sizeof(byte); //standard: 8
const int EXPONENT_BITS = 3 + (SIGN_BIT ? 0 : 1); //standard: 3
const int MANTISSA_BITS = BITS - EXPONENT_BITS - (SIGN_BIT ? 1 : 0); //standard: 4
const int EXPONENT_BIAS = -(((1 << BITS) - 1) >> (BITS - (EXPONENT_BITS - 1))); //standard: -3
const int MAX_EXPONENT = EXPONENT_BIAS + ((1 << EXPONENT_BITS) - 1) - (IEEE_754_STANDARD ? 1 : 0); //standard: 3
const int SIGNALING_EXPONENT = (MAX_EXPONENT - EXPONENT_BIAS + (IEEE_754_STANDARD ? 1 : 0)) << MANTISSA_BITS; //standard: 0b0111_0000
const int F32_BITS = 8 * sizeof(float);
const int F32_EXPONENT_BITS = 8;
const int F32_MANTISSA_BITS = 23;
const int F32_EXPONENT_BIAS = -(int)(((1L << F32_BITS) - 1) >> (F32_BITS - (F32_EXPONENT_BITS - 1)));
const int F32_MAX_EXPONENT = F32_EXPONENT_BIAS + ((1 << F32_EXPONENT_BITS) - 1 - 1);
const int F32_SIGNALING_EXPONENT = (F32_MAX_EXPONENT - F32_EXPONENT_BIAS + 1) << F32_MANTISSA_BITS;
const int F32_SHL_LOSE_SIGN = (F32_BITS - (MANTISSA_BITS + EXPONENT_BITS));
const int F32_SHR_PLACE_MANTISSA = MANTISSA_BITS + ((1 + F32_EXPONENT_BITS) - (MANTISSA_BITS + EXPONENT_BITS));
const int F32_MAGIC = (((1 << F32_EXPONENT_BITS) - 1) - (1 + EXPONENT_BITS)) << F32_MANTISSA_BITS;
byte _value;
static quarter Epsilon => new quarter { _value = 1 };
static quarter MaxValue => new quarter { _value = (byte)(SIGNALING_EXPONENT - 1) };
static quarter NaN => new quarter { _value = (byte)(SIGNALING_EXPONENT | 1) };
static quarter PositiveInfinity => new quarter { _value = (byte)SIGNALING_EXPONENT };
static uint asuint(float f) => *(uint*)&f;
static float asfloat(uint u) => *(float*)&u;
static byte tobyte(bool b) => *(byte*)&b;
static float ToFloat(quarter q, bool promiseInRange)
{
uint fusedExponentMantissa = ((uint)q._value << F32_SHL_LOSE_SIGN) >> F32_SHR_PLACE_MANTISSA;
uint sign = ((uint)q._value >> (BITS - 1)) << (F32_BITS - 1);
if (!promiseInRange)
{
bool nanInf = (q._value & SIGNALING_EXPONENT) == SIGNALING_EXPONENT;
uint ifNanInf = asuint(float.PositiveInfinity) & (uint)(-tobyte(nanInf));
return (nanInf ? 1f : asfloat(F32_MAGIC)) * asfloat(sign | fusedExponentMantissa | ifNanInf);
}
else
{
return asfloat(F32_MAGIC) * asfloat(sign | fusedExponentMantissa);
}
}
static quarter ToQuarter(float f, bool promiseInRange)
{
float inRange = f * (1f / asfloat(F32_MAGIC));
uint q = asuint(inRange) >> (F32_MANTISSA_BITS - (1 + EXPONENT_BITS));
uint f8_sign = asuint(f) >> (F32_BITS - 1);
if (!promiseInRange)
{
uint f32_exponent = asuint(f) & F32_SIGNALING_EXPONENT;
bool overflow = f32_exponent > (uint)(-F32_EXPONENT_BIAS + MAX_EXPONENT << F32_MANTISSA_BITS);
bool notNaNInf = f32_exponent != F32_SIGNALING_EXPONENT;
f8_sign ^= tobyte(!notNaNInf);
if (overflow & notNaNInf)
{
q = PositiveInfinity._value;
}
}
f8_sign <<= (BITS - 1);
return new quarter{ _value = (byte)(q ^ f8_sign) };
}
}
Turns out that in fact, the reverse operation of converting the mini-float to a 32 bit float by multiplying with a magic constant is also the reverse operation of a multiplication (wow...): a floating point division by that constant.
Luckily "by that constant" and not the other way around; we can calculate the reciprocal at compile time and multiply by it instead. This only fails, as with the reverse operation, when converting to- and from 'INF' and 'NaN'. Absolute overflow with any biased 32 exponent with exponent % (MAX_EXPONENT + 1) != 0 is not translated into 'INF' and positive 'INF' is translated into negative 'INF'.
Although this enables some optimizations through the bool paramater, this mostly just reduces code size and more importantly (especially for SIMD versions, where small data types really shine) reduces the need for constants. Speaking of SIMD: This scalar version can be optimized a little by using SSE/SSE2 intrinsics.
The (disabled) optimizations (would) run completely in parallel to the floating point multiplication followed by a shift, taking a total of 5 to 6+ clock cycles (very CPU dependant), which is astonishingly close to native hardware instructions (~4 to 5 clock cycles).
I have been working on a C# implementation of 2048 for the purpose of implementing reinforcement learning.
The "slide" operation for each move requires that tiles be moved and combined according to specific rules. Doing so involves a number of transformations on a 2d array of values.
Until recently I was using a 4x4 byte matrix:
var field = new byte[4,4];
Each value was an exponent of 2, so 0=0, 1=2, 2=4, 3=8, and so forth. The 2048 tile would be represented by 11.
Because the (practical) maximum value for a given tile is 15 (which only requires 4 bits), it is possible to shove the contents of this 4x4 byte array into a ulong value.
It turns out that certain operations are vastly more efficient with this representation. For example, I commonly have to invert arrays like this:
//flip horizontally
const byte SZ = 4;
public static byte[,] Invert(this byte[,] squares)
{
var tmp = new byte[SZ, SZ];
for (byte x = 0; x < SZ; x++)
for (byte y = 0; y < SZ; y++)
tmp[x, y] = squares[x, SZ - y - 1];
return tmp;
}
I can do this inversion to a ulong ~15x faster:
public static ulong Invert(this ulong state)
{
ulong c1 = state & 0xF000F000F000F000L;
ulong c2 = state & 0x0F000F000F000F00L;
ulong c3 = state & 0x00F000F000F000F0L;
ulong c4 = state & 0x000F000F000F000FL;
return (c1 >> 12) | (c2 >> 4) | (c3 << 4) | (c4 << 12);
}
Note the use of hex, which is extremely useful because each character represents a tile.
The operation I've having the most trouble with is Transpose, which flipped the x and y coordinates of values in the 2d array, like this:
public static byte[,] Transpose(this byte[,] squares)
{
var tmp = new byte[SZ, SZ];
for (byte x = 0; x < SZ; x++)
for (byte y = 0; y < SZ; y++)
tmp[y, x] = squares[x, y];
return tmp;
}
The fastest way I've found to do this is using this bit of ridiculousness:
public static ulong Transpose(this ulong state)
{
ulong result = state & 0xF0000F0000F0000FL; //unchanged diagonals
result |= (state & 0x0F00000000000000L) >> 12;
result |= (state & 0x00F0000000000000L) >> 24;
result |= (state & 0x000F000000000000L) >> 36;
result |= (state & 0x0000F00000000000L) << 12;
result |= (state & 0x000000F000000000L) >> 12;
result |= (state & 0x0000000F00000000L) >> 24;
result |= (state & 0x00000000F0000000L) << 24;
result |= (state & 0x000000000F000000L) << 12;
result |= (state & 0x00000000000F0000L) >> 12;
result |= (state & 0x000000000000F000L) << 36;
result |= (state & 0x0000000000000F00L) << 24;
result |= (state & 0x00000000000000F0L) << 12;
return result;
}
Shockingly, this is still nearly 3x faster than the loop version. However, I'm looking for a more performant method either using leveraging a pattern inherent in transposition or more efficient management of the bits I'm moving around.
you can skip 6 steps by combining, i commented them out to show you result, should make it twice as fast:
public static ulong Transpose(this ulong state)
{
ulong result = state & 0xF0000F0000F0000FL; //unchanged diagonals
result |= (state & 0x0F0000F0000F0000L) >> 12;
result |= (state & 0x00F0000F00000000L) >> 24;
result |= (state & 0x000F000000000000L) >> 36;
result |= (state & 0x0000F0000F0000F0L) << 12;
//result |= (state & 0x000000F000000000L) >> 12;
//result |= (state & 0x0000000F00000000L) >> 24;
result |= (state & 0x00000000F0000F00L) << 24;
//result |= (state & 0x000000000F000000L) << 12;
//result |= (state & 0x00000000000F0000L) >> 12;
result |= (state & 0x000000000000F000L) << 36;
//result |= (state & 0x0000000000000F00L) << 24;
//result |= (state & 0x00000000000000F0L) << 12;
return result;
}
An other trick is that sometimes it is possible to move disjoint sets of bit-groups left by different amounts using a multiplication. This requires that the partial products don't "overlap".
For example the moves left by 12 and 24 could be done as:
ulong t = (state & 0x0000F000FF000FF0UL) * ((1UL << 12) + (1UL << 24));
r0 |= t & 0x0FF000FF000F0000UL;
That reduces 6 operations to 4. The multiplication shouldn't be slow, on a modern processor it takes 3 cycles, and while it is working on that multiply the processor can go ahead and work on the other steps too. As a bonus, on Intel the imul would go to port 1 while the shifts go to ports 0 and 6, so saving two shifts with a multiply is a good deal, opening up more room for the other shifts. The AND and OR operations can go to any ALU port and aren't really the problem here, but it may help for latency to split up the chain of dependent ORs:
public static ulong Transpose(this ulong state)
{
ulong r0 = state & 0xF0000F0000F0000FL; //unchanged diagonals
ulong t = (state & 0x0000F000FF000FF0UL) * ((1UL << 12) + (1UL << 24));
ulong r1 = (state & 0x0F0000F0000F0000L) >> 12;
r0 |= (state & 0x00F0000F00000000L) >> 24;
r1 |= (state & 0x000F000000000000L) >> 36;
r0 |= (state & 0x000000000000F000L) << 36;
r1 |= t & 0x0FF000FF000F0000UL;
return r0 | r1;
}
I have a password hashing method in C# that I'm trying to port it to PHP for my website, which will allow both my website and application to use the passwords from the same database (application requires a website account to use). The problem is, once the password gets over 7 characters in length, the result is different in php then what I'm getting in C#, but any password less then 8 characters, matches the C# encryption exactly.
here my method in C#
public static byte[] PassEncode(byte[] pass)
{
int a = 0;
int num = 0x79707367; // starting num
for (int i = 0; i < pass.Length; i++)
{
num = PassLame(num);
a = num % 0xFF;
pass[i] ^= (byte)a;
}
return pass;
}
private static int PassLame(int num)
{
int c = (num >> 16) & 0xffff;
int a = num & 0xffff;
c *= 0x41a7;
a *= 0x41a7;
a += ((c & 0x7fff) << 16);
if (a < 0)
{
a &= 0x7fffffff;
a++;
}
a += (c >> 15);
if (a < 0)
{
a &= 0x7fffffff;
a++;
}
return a;
}
And my methods in PHP:
function PassEncode($pass)
{
$a = 0;
$num = 0x79707367;
for ($i = 0; $i < sizeof($pass); $i++)
{
$num = PassLame($num);
$a = $num % 0xFF;
$pass[$i] ^= $a;
}
return $pass;
}
function PassLame($num)
{
$c = ($num >> 16) & 0xffff;
$a = $num & 0xffff;
$c *= 0x41a7;
$a *= 0x41a7;
$a += (($c & 0x7fff) << 16);
if ($a < 0)
{
$a &= 0x7fffffff;
$a++;
}
$a += ($c >> 15);
if ($a < 0)
{
$a &= 0x7fffffff;
$a++;
}
return $a;
}
The bytes I'm using is for the word "testing".
bytes = ([0]=> 116 [1]=> 101 [2]=> 115 [3]=> 116 [4]=> 105 [5]=> 110 [6]=> 103)
When I plug these in, the 8th digit returned (and beyond if using a larger pass) are a lot different then in C#./ My results
C#:
[0]=> int(98) [1]=> int(151) [2]=> int(135) [3]=> int(134) [4]=> int(66) [5]=> int(181) [6]=> int(113)
PHP:
[0]=> int(98) [1]=> int(151) [2]=> int(135) [3]=> int(134) [4]=> int(66) [5]=> int(181) [6]=> int(11)
Can anyone help me solve this? I'm using a 32bit webserver and compiling my application in 32bit as well.
I believe PHP handles integer overflow by converting the value to float. That is almost certainly the cause of your problem.
You could write custom functions for the arithmetic that check for overflow and mimic c#'s wraparound behavior.
I am trying to extract the height from a file like this:
http://visibleearth.nasa.gov/view.php?id=73934
The pixels are loaded into an Int32 array
private Int16[] heights;
private int Width, Height;
public TextureData(Texture2D t)
{
Int32[] data = new Int32[t.Width * t.Height];
t.GetData<Int32>(data);
Width = t.Width;
Height = t.Height;
t.Dispose();
heights= new Int16[t.Width * t.Height];
for (int i = 0; i < data.Length; ++i)
{
heights[i] = ReverseBytes(data[i]);
}
}
// reverse byte order (16-bit)
public static Int16 ReverseBytes(Int32 value)
{
return (Int16)( ((value << 8) | (value >> 8)) );
}
I dont know why but the heights are not correct...
I think the Big Endian conversion is wrong, can you help me please?
this is the result, the heights are higher than expected...
http://i.imgur.com/FukdmLF.png
EDIT:
public static int ReverseBytes(int value)
{
int sign = (value & 0x8000) >> 15;
int msb = (value & 0x7F) >> 7;
int lsb = (value & 0xFF) << 8;
return (msb | lsb | sign);
}
is this ok? I don't know why but it is still wrong...
int refers to a 32 bit signed integer but your byte-reverser is written for a 16 bit signed integer so it will only work for positive values up to 32767. If you have any values higher than that you will need to shift and then mask one byte at a time before "orring" them together.
I have a List<bool> which I want to convert to a byte[]. How do i do this?
list.toArray() creates a bool[].
Here's two approaches, depending on whether you want to pack the bits into bytes, or have as many bytes as original bits:
bool[] bools = { true, false, true, false, false, true, false, true,
true };
// basic - same count
byte[] arr1 = Array.ConvertAll(bools, b => b ? (byte)1 : (byte)0);
// pack (in this case, using the first bool as the lsb - if you want
// the first bool as the msb, reverse things ;-p)
int bytes = bools.Length / 8;
if ((bools.Length % 8) != 0) bytes++;
byte[] arr2 = new byte[bytes];
int bitIndex = 0, byteIndex = 0;
for (int i = 0; i < bools.Length; i++)
{
if (bools[i])
{
arr2[byteIndex] |= (byte)(((byte)1) << bitIndex);
}
bitIndex++;
if (bitIndex == 8)
{
bitIndex = 0;
byteIndex++;
}
}
Marc's answer is good already, but...
Assuming you are the kind of person that is comfortable doing bit-twiddling, or just want to write less code and squeeze out some more performance, then this here code is for you good sir / madame:
byte[] PackBoolsInByteArray(bool[] bools)
{
int len = bools.Length;
int bytes = len >> 3;
if ((len & 0x07) != 0) ++bytes;
byte[] arr2 = new byte[bytes];
for (int i = 0; i < bools.Length; i++)
{
if (bools[i])
arr2[i >> 3] |= (byte)(1 << (i & 0x07));
}
}
It does the exact same thing as Marc's code, it's just more succinct.
Of course if we really want to go all out we could unroll it too...
...and while we are at it lets throw in a curve ball on the return type!
IEnumerable<byte> PackBoolsInByteEnumerable(bool[] bools)
{
int len = bools.Length;
int rem = len & 0x07; // hint: rem = len % 8.
/*
byte[] byteArr = rem == 0 // length is a multiple of 8? (no remainder?)
? new byte[len >> 3] // -yes-
: new byte[(len >> 3)+ 1]; // -no-
*/
const byte BZ = 0,
B0 = 1 << 0, B1 = 1 << 1, B2 = 1 << 2, B3 = 1 << 3,
B4 = 1 << 4, B5 = 1 << 5, B6 = 1 << 6, B7 = 1 << 7;
byte b;
int i = 0;
for (int mul = len & ~0x07; i < mul; i += 8) // hint: len = mul + rem.
{
b = bools[i] ? B0 : BZ;
if (bools[i + 1]) b |= B1;
if (bools[i + 2]) b |= B2;
if (bools[i + 3]) b |= B3;
if (bools[i + 4]) b |= B4;
if (bools[i + 5]) b |= B5;
if (bools[i + 6]) b |= B6;
if (bools[i + 7]) b |= B7;
//byteArr[i >> 3] = b;
yield return b;
}
if (rem != 0) // take care of the remainder...
{
b = bools[i] ? B0 : BZ; // (there is at least one more bool.)
switch (rem) // rem is [1:7] (fall-through switch!)
{
case 7:
if (bools[i + 6]) b |= B6;
goto case 6;
case 6:
if (bools[i + 5]) b |= B5;
goto case 5;
case 5:
if (bools[i + 4]) b |= B4;
goto case 4;
case 4:
if (bools[i + 3]) b |= B3;
goto case 3;
case 3:
if (bools[i + 2]) b |= B2;
goto case 2;
case 2:
if (bools[i + 1]) b |= B1;
break;
// case 1 is the statement above the switch!
}
//byteArr[i >> 3] = b; // write the last byte to the array.
yield return b; // yield the last byte.
}
//return byteArr;
}
Tip: As you can see I included the code for returning a byte[] as comments. Simply comment out the two yield statements instead if that is what you want/need.
Twiddling Hints:
Shifting x >> 3 is a cheaper x / 8.
Masking x & 0x07 is a cheaper x % 8.
Masking x & ~0x07 is a cheaper x - x % 8.
Edit:
Here is some example documentation:
/// <summary>
/// Bit-packs an array of booleans into bytes, one bit per boolean.
/// </summary><remarks>
/// Booleans are bit-packed into bytes, in order, from least significant
/// bit to most significant bit of each byte.<br/>
/// If the length of the input array isn't a multiple of eight, then one
/// or more of the most significant bits in the last byte returned will
/// be unused. Unused bits are zero / unset.
/// </remarks>
/// <param name="bools">An array of booleans to pack into bytes.</param>
/// <returns>
/// An IEnumerable<byte> of bytes each containing (up to) eight
/// bit-packed booleans.
/// </returns>
You can use LINQ. This won't be efficient, but will be simple. I'm assuming that you want one byte per bool.
bool[] a = new bool[] { true, false, true, true, false, true };
byte[] b = (from x in a select x ? (byte)0x1 : (byte)0x0).ToArray();
Or the IEnumerable approach to AnorZaken's answer:
static IEnumerable<byte> PackBools(IEnumerable<bool> bools)
{
int bitIndex = 0;
byte currentByte = 0;
foreach (bool val in bools) {
if (val)
currentByte |= (byte)(1 << bitIndex);
if (++bitIndex == 8) {
yield return currentByte;
bitIndex = 0;
currentByte = 0;
}
}
if (bitIndex != 8) {
yield return currentByte;
}
}
And the according unpacking where paddingEnd means the amount of bits to discard from the last byte to unpack:
static IEnumerable<bool> UnpackBools(IEnumerable<byte> bytes, int paddingEnd = 0)
{
using (var enumerator = bytes.GetEnumerator()) {
bool last = !enumerator.MoveNext();
while (!last) {
byte current = enumerator.Current;
last = !enumerator.MoveNext();
for (int i = 0; i < 8 - (last ? paddingEnd : 0); i++) {
yield return (current & (1 << i)) != 0;
}
}
}
}
If you have any control over the type of list, try to make it a List, which will then produce the byte[] on ToArray(). If you have an ArrayList, you can use:
(byte[])list.ToArray(typeof(byte));
To get the List, you could create one with your unspecified list iterator as an input to the constructor, and then produce the ToArray()? Or copy each item, casting to a new byte from bool?
Some info on what type of list it is might help.
Have a look at the BitConverter class. Depending on the exact nature of your requirement, it may solve your problem quite neatly.
Another LINQ approach, less effective than #hfcs101's but would easily work for other value types as well:
var a = new [] { true, false, true, true, false, true };
byte[] b = a.Select(BitConverter.GetBytes).SelectMany(x => x).ToArray();