Convert Two Bytes to Half Float In C# - c#

How can I convert two bytes (Hi, Lo) into this custom half float implementation using c#? Its slightly different to other implementation, as it has 9 bits of precision. I've tried modifying this answer https://stackoverflow.com/a/37761168/14515159 to suit the spec below, but its not converting correctly.
From a protocol sniffer Application
HO = 0x41, LO = 0xC5 should give value of 3.7695313.
However I have not been able to confirm this. The outputs should be in the range of 0 to 5 and all my conversions are way off.
public static float toTwoByteFloat(byte HO, byte LO)
{
var intVal = BitConverter.ToInt32(new byte[] { HO, LO, 0, 0 }, 0);
int mant = intVal & 0x07ff;
int exp = intVal & 0x7e00;
if (exp == 0x3c00) exp = 0x3fc00;
else if (exp != 0)
{
exp += 0x1c000;
if (mant == 0 && exp > 0x1c400)
return BitConverter.ToSingle(BitConverter.GetBytes((intVal & 0x8000) << 16 | exp << 14 | 0x7ff), 0);
}
else if (mant != 0)
{
exp = 0x1c400;
do
{
mant <<= 1;
exp -= 0x400;
} while ((mant & 0x400) == 0);
mant &= 0x7ff;
}
return BitConverter.ToSingle(BitConverter.GetBytes((intVal & 0x8000) << 16 | (exp | mant) << 14), 0);
}
Below is information from custom protocol that I'm working with.
Note for image below that in this protocol document they refer to Bit 0 as the MSB.
Half Float
This format can represent numbers from 231 to 2-31 with 9-bits of precision. A zero is represented with >a mantissa and exponent of zero. The exponent is biased with 31 and the mantissa assumes a left most >leading 1. This definition exactly matches the IEEE floating point formats, except the range of the >fields has been reduced.
Converting from IEEE-754 32-bit format to the 16-bit format is done by splitting the IEEE number into >its component parts: sign, exponent, and mantissa. The mantissa is right shifted 14-bits, losing that >resolution. The exponent is then unbiased (IEEE-754 uses a bias of 127) and re-biased with 31. The 16->bit format is then assembled from the original sign bit, the new exponent, and the new mantissa. During >the re-biasing of the exponent under flow and overflow is detected. Underflows should result in positive >zero. Overflows should result in the maximum possible exponent.

Working with a colleague, we came up with the solution below that converts correctly
public static float Parse16BitFloat(byte HI, byte LO)
{
// Program assumes ints are at least 16 bits
int fullFloat = ((HI << 8) | LO);
int exponent = (HI & 0b01111110) >> 1; // minor optimisation can be placed here
int mant = fullFloat & 0x01FF;
// Special values
if (exponent == 0b00111111) // If using constants, shift right by 1
{
// Check for non or inf
return mant != 0 ? float.NaN :
((HI & 0x80) == 0 ? float.PositiveInfinity : float.NegativeInfinity);
}
else // normal/denormal values: pad numbers
{
exponent = exponent - 31 + 127;
mant = mant << 14;
Int32 finalFloat = (HI & 0x80) << 24 | (exponent << 23) | mant;
return BitConverter.ToSingle(BitConverter.GetBytes(finalFloat), 0);
}
}

Related

c# FTDI 24bit two's complement implement

I am a beginner at C# and would like some advice on how to solve the following problem:
My code read 3 byte from ADC true FTDI ic and calculating value. Problem is that sometimes I get values under zero (less than 0000 0000 0000 0000 0000 0000) and my calculating value jump to a big number. I need to implement two's complement but I and I don't know how.
byte[] readData1 = new byte[3];
ftStatus = myFtdiDevice.Read(readData1, numBytesAvailable, ref numBytesRead); //read data from FTDI
int vrednost1 = (readData1[2] << 16) | (readData1[1] << 8) | (readData1[0]); //convert LSB MSB
int v = ((((4.096 / 16777216) * vrednost1) / 4) * 250); //calculate
Convert the data left-justified, so that your sign-bit lands in the spot your PC processor treats as a sign bit. Then right-shift back, which will automatically perform sign-extension.
int left_justified = (readData[2] << 24) | (readData[1] << 16) | (readData[0] << 8);
int result = left_justified >> 8;
Equivalently, one can mark the byte containing a sign bit as signed, so the processor will perform sign extension:
int result = (unchecked((sbyte)readData[2]) << 16) | (readData[1] << 8) | readData[0];
The second approach with a cast only works if the sign bit is already aligned left within any one of the bytes. The left-justification approach can work for arbitrary sizes, such as 18-bit two's complement readings. In this situation the cast to sbyte wouldn't do the job.
int left_justified18 = (readData[2] << 30) | (readData[1] << 22) | (readData[0] << 14);
int result18 = left_justified >> 14;
Well, the only obscure moment here is endianness (big or little). Assuming that byte[0] stands for the least significant byte you can put
private static int FromInt24(byte[] data) {
int result = (data[2] << 16) | (data[1] << 8) | data[0];
return data[2] < 128
? result // positive number
: -((~result & 0xFFFFFF) + 1); // negative number, its abs value is 2-complement
}
Demo:
byte[][] tests = new byte[][] {
new byte[] { 255, 255, 255},
new byte[] { 44, 1, 0},
};
string report = string.Join(Environment.NewLine, tests
.Select(test => $"[{string.Join(", ", test.Select(x => $"0x{x:X2}"))}] == {FromInt24(test)}"));
Console.Write(report);
Outcome:
[0xFF, 0xFF, 0xFF] == -1
[0x2C, 0x01, 0x00] == 300
If you have big endianness (e.g. 300 == {0, 1, 44}) you have to swap bytes:
private static int FromInt24(byte[] data) {
int result = (data[0] << 16) | (data[1] << 8) | data[2];
return data[0] < 128
? result
: -((~result & 0xFFFFFF) + 1);
}

How to convert a float/double/half to a minifloat the optimal way (improve my already working code)?

I've written an IEEE 754 "quarter" 8-bit minifloat in a 1.3.4.−3 format in C#.
It was mostly a fun little side-project, testing whether or not I understand floats.
Actually, though, I find myself using it more than I'd like to admit :) (bandwidth > clock ticks)
Here's my code for converting the minifloat to a 32-bit float:
public static implicit operator float(quarter q)
{
int sign = (q.value & 0b1000_0000) << 24;
int fusedExponentMantissa = (q.value & 0b0111_1111) << (23 - MANTISSA_BITS);
if ((q.value & 0b0111_0000) == 0b0111_0000) // NaN/Infinity
{
return asfloat(sign | (255 << 23) | fusedExponentMantissa);
}
else // normal and subnormal
{
float magic = asfloat((255 - 1 + EXPONENT_BIAS) << 23);
return magic * asfloat(sign | fusedExponentMantissa);
}
}
where quarter.value is the stored byte and "asfloat" is simply *(float*)&myUInt.The "magic" number makes use of mantissa overflow in the subnormal case, which affects the f_32 exponent (integer multiplication and mask + add is slower than FPU-switch and float multiplication). I guess one could optimize away the branch, too.
But here comes the problematic code - float_32 to float_8:
public static explicit operator quarter(float f)
{
byte f8_sign = (byte)((asuint(f) & 0x8000_0000u) >> 24);
uint f32_exponent = asuint(f) & 0x7F80_0000u;
uint f32_mantissa = asuint(f) & 0x007F_FFFFu;
if (f32_exponent < (120 << 23)) // underflow => preserve +/- 0
{
return new quarter { value = f8_sign };
}
else if (f32_exponent > (130 << 23)) // overflow => +/- infinity or preserve NaN
{
return new quarter { value = (byte)(f8_sign | PositiveInfinity.value | touint8(isnan(f))) };
}
else
{
switch (f32_exponent)
{
case 120 << 23: // 2^(-7) * 1.(mantissa > 0) means the value is closer to quarter.epsilon than 0
{
return new quarter { value = (byte)(f8_sign | touint8(f32_mantissa != 0)) };
}
case 121 << 23: // 2^(-6) * (1 + mantissa): return +/- quarter.epsilon = 2^(-2) * (0 + 2^(-4)); if the mantissa is > 0.5 i.e. 2^(-6) * max(mantissa, 1.75), return 2^(-2) * 2^(-3)
{
return new quarter { value = (byte)(f8_sign | (Epsilon.value + touint8(f32_mantissa > 0x0040_0000))) };
}
case 122 << 23:
{
return new quarter { value = (byte)(f8_sign | 0b0000_0010u | (f32_mantissa >> 22)) };
}
case 123 << 23:
{
return new quarter { value = (byte)(f8_sign | 0b0000_0100u | (f32_mantissa >> 21)) };
}
case 124 << 23:
{
return new quarter { value = (byte)(f8_sign | 0b0000_1000u | (f32_mantissa >> 20)) };
}
default:
{
const uint exponentDelta = (127 + EXPONENT_BIAS) << 23;
return new quarter { value = (byte)(f8_sign | (((f32_exponent - exponentDelta) | f32_mantissa) >> 19)) };
}
}
}
}
... where the function
"asuint" is simply *(uint*)&myFloat and
"touint8" is simply *(byte*)&myBoolean i.e. myBoolean ? 1 : 0.
The first five cases deal with numbers that can only be represented as subnormals in a "quarter".
I want to get rid of the switch at the very least. There's obviously a pattern (same as with float8_to_float32) but I haven't been able to figure out how I could unify the entire switch for days... I tried to google how hardware converts doubles to floats but that yielded no results either.
My requirements are to hold on to the IEEE-754 standard, meaning:
NaN, infinity preservation and clamping to infinity/zero in case of over-/underflow, aswell as rounding to epsilon when the larger type's value is closer to epsilon than 0 (first switch case aswell as the underflow limit in the first if statement).
Can anyone at least push me in the right direction please?
This may not be optimal, but it uses strictly conforming C code except as noted in the first comment, so no pointer aliasing or other manipulation of the bits of a floating-point object. A thorough test program is included.
#include <inttypes.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
/* Notes on portability:
uint8_t is an optional type. Its use here is easily replaced by
unsigned char.
Round-to-nearest is required in FloatToMini.
Floating-point must be base two, and the constant in the
Dekker-Veltkamp split is hardcoded for IEEE-754 binary64 but could be
adopted to other formats. (Change the exponent in 0x1p48 to the number
of bits in the significand minus five.)
*/
/* Convert a double to a 1-3-4 floating-point format. Round-to-nearest is
required.
*/
static uint8_t FloatToMini(double x)
{
// Extract the sign bit of x, moved into its position in a mini-float.
uint8_t s = !!signbit(x) << 7;
x = fabs(x);
/* If x is a NaN, return a quiet NaN with the copied sign. Significand
bits are not preserved.
*/
if (x != x)
return s | 0x78;
/* If |x| is greater than or equal to the rounding point between the
maximum finite value and infinity, return infinity with the copied sign.
(0x1.fp0 is the largest representable significand, 0x1.f8 is that plus
half an ULP, and the largest exponent is 3, so 0x1.f8p3 is that
rounding point.)
*/
if (0x1.f8p3 <= x)
return s | 0x70;
// If x is subnormal, encode with zero exponent.
if (x < 0x1p-2 - 0x1p-7)
return s | (uint8_t) nearbyint(x * 0x1p6);
/* Round to five significand bits using the Dekker-Veltkamp Split. (The
cast eliminates the excess precision that the C standard allows.)
*/
double d = x * (0x1p48 + 1);
x = d - (double) (d-x);
/* Separate the significand and exponent. C's frexp scales the exponent
so the significand is in [.5, 1), hence the e-1 below.
*/
int e;
x = frexp(x, &e) - .5;
return s | (e-1+3) << 4 | (uint8_t) (x*0x1p5);
}
static void Show(double x)
{
printf("%g -> 0x%02" PRIx8 ".\n", x, FloatToMini(x));
}
static void Test(double x, uint8_t expected)
{
uint8_t observed = FloatToMini(x);
if (expected != observed)
{
printf("Error, %.9g (%a) produced 0x%02" PRIx8
" but expected 0x%02" PRIx8 ".\n",
x, x, observed, expected);
exit(EXIT_FAILURE);
}
}
int main(void)
{
// Set the value of an ULP in [1, 2).
static const double ULP = 0x1p-4;
// Test all even significands with normal exponents.
for (double s = 1; s < 2; s += 2*ULP)
// Test with trailing bits less than or equal to 1/2 ULP in magnitude.
for (double t = -ULP / (s == 1 ? 4 : 2); t <= +ULP/2; t += ULP/16)
// Test with all normal exponents.
for (int e = 1-3; e < 7-3; ++e)
// Test with both signs.
for (int sign = -1; sign <= +1; sign += 2)
{
// Prepare the expected encoding.
uint8_t expected =
(0 < sign ? 0 : 1) << 7
| (e+3) << 4
| (uint8_t) ((s-1) * 0x1p4);
Test(sign * ldexp(s+t, e), expected);
}
// Test all odd significands with normal exponents.
for (double s = 1 + 1*ULP; s < 2; s += 2*ULP)
// Test with trailing bits less than or equal to 1/2 ULP in magnitude.
for (double t = -ULP/2+ULP/16; t < +ULP/2; t += ULP/16)
// Test with all normal exponents.
for (int e = 1-3; e < 7-3; ++e)
// Test with both signs.
for (int sign = -1; sign <= +1; sign += 2)
{
// Prepare the expected encoding.
uint8_t expected =
(0 < sign ? 0 : 1) << 7
| (e+3) << 4
| (uint8_t) ((s-1) * 0x1p4);
Test(sign * ldexp(s+t, e), expected);
}
// Set the value of an ULP in the subnormal range.
static const double subULP = ULP * 0x1p-2;
// Test all even significands with the subnormal exponent.
for (double s = 0; s < 0x1p-2; s += 2*subULP)
// Test with trailing bits less than or equal to 1/2 ULP in magnitude.
for (double t = s == 0 ? 0 : -subULP/2; t <= +subULP/2; t += subULP/16)
{
// Test with both signs.
for (int sign = -1; sign <= +1; sign += 2)
{
// Prepare the expected encoding.
uint8_t expected =
(0 < sign ? 0 : 1) << 7
| (uint8_t) (s/subULP);
Test(sign * (s+t), expected);
}
}
// Test all odd significands with the subnormal exponent.
for (double s = 0 + 1*subULP; s < 0x1p-2; s += 2*subULP)
// Test with trailing bits less than or equal to 1/2 ULP in magnitude.
for (double t = -subULP/2 + subULP/16; t < +subULP/2; t += subULP/16)
{
// Test with both signs.
for (int sign = -1; sign <= +1; sign += 2)
{
// Prepare the expected encoding.
uint8_t expected =
(0 < sign ? 0 : 1) << 7
| (uint8_t) (s/subULP);
Test(sign * (s+t), expected);
}
}
// Test at and slightly under the point of rounding to infinity.
Test(+15.75, 0x70);
Test(-15.75, 0xf0);
Test(nexttoward(+15.75, 0), 0x6f);
Test(nexttoward(-15.75, 0), 0xef);
// Test infinities and NaNs.
Test(+INFINITY, 0x70);
Test(-INFINITY, 0xf0);
Test(+NAN, 0x78);
Test(-NAN, 0xf8);
Show(0);
Show(0x1p-6);
Show(0x1p-2);
Show(0x1.1p-2);
Show(0x1.2p-2);
Show(0x1.4p-2);
Show(0x1.8p-2);
Show(0x1p-1);
Show(15.5);
Show(15.75);
Show(16);
Show(NAN);
Show(1./6);
Show(1./3);
Show(2./3);
}
I hate to answer my own question... But this may still not be the optimal solution.
Although #Eric Postpischil's solution uses an established algorithm, it is not very well suited for minifloats, since there are so few denormals in 4 mantissa bits. Additionally, the overhead of multiple float arithmetic operations - and because of the actual code behind frexp in particular, it only has one branch less (or two when inlined and optimized) than my original solution and is also not that great in regards to instruction level parallelism.
So here's my current solution:
public static explicit operator quarter(float f)
{
byte f8_sign = (byte)((asuint(f) >> 31) << 7);
uint f32_exponent = (asuint(f) >> 23) & 0x00FFu;
uint f32_mantissa = asuint(f) & 0x007F_FFFFu;
if (f32_exponent < 120) // underflow => preserve +/- 0
{
return new quarter { value = f8_sign };
}
else if (f32_exponent > 130) // overflow => +/- infinity or preserve NaN
{
return new quarter { value = (byte)(f8_sign | PositiveInfinity.value | touint8(isnan(f))) };
}
else
{
int cmp = 125 - (int)f32_exponent;
int cmpIsZeroOrNegativeMask = (cmp - 1) >> 31;
int denormalExponent = andnot(0b0001_0000 >> cmp, cmpIsZeroOrNegativeMask); // special case 121: sets it to quarter.Epsilon
denormalExponent += touint8((f32_exponent == 121) & (f32_mantissa >= 0x0040_0000)); // case 121: 2^(-6) * (1 + mantissa): return +/- quarter.Epsilon = 2^(-2) * 2^(-4); if the mantissa is >= 0.5 return 2^(-2) * 2^(-3)
denormalExponent |= touint8((f32_exponent == 120) & (f32_mantissa != 0)); // case 120: 2^(-7) * 1.(mantissa > 0) means the value is closer to quarter.epsilon than 0
int normalExponent = (cmpIsZeroOrNegativeMask & ((int)f32_exponent - (127 + EXPONENT_BIAS))) << 4;
int mantissaShift = 19 + andnot(cmp, cmpIsZeroOrNegativeMask);
return new quarter { value = (byte)((f8_sign | normalExponent) | (denormalExponent | (f32_mantissa >> mantissaShift))) };
}
}
But note that the particular andnot(int a, int b) function I use returns a & ~b and...not ~a & b.
Thanks for your help :) I'm keeping this open since, as mentioned, this may very well not be the best solution - but at least it's my own...
PS: This is probably a good example for why PREMATURE optimization is bad; Your code is much less readable. Make sure you have the functionality backed up by unit tests and make sure you even need the optimization in the first place.
...And after some time and in the spirit of transparent progression, I want to show the final version, since I believe to have found the optimal implementation; more later.
First off, here it is (the code should speak for itself, which is why it is this "much"):
unsafe struct quarter
{
const bool IEEE_754_STANDARD = true; //standard: true
const bool SIGN_BIT = IEEE_754_STANDARD || true; //standard: true
const int BITS = 8 * sizeof(byte); //standard: 8
const int EXPONENT_BITS = 3 + (SIGN_BIT ? 0 : 1); //standard: 3
const int MANTISSA_BITS = BITS - EXPONENT_BITS - (SIGN_BIT ? 1 : 0); //standard: 4
const int EXPONENT_BIAS = -(((1 << BITS) - 1) >> (BITS - (EXPONENT_BITS - 1))); //standard: -3
const int MAX_EXPONENT = EXPONENT_BIAS + ((1 << EXPONENT_BITS) - 1) - (IEEE_754_STANDARD ? 1 : 0); //standard: 3
const int SIGNALING_EXPONENT = (MAX_EXPONENT - EXPONENT_BIAS + (IEEE_754_STANDARD ? 1 : 0)) << MANTISSA_BITS; //standard: 0b0111_0000
const int F32_BITS = 8 * sizeof(float);
const int F32_EXPONENT_BITS = 8;
const int F32_MANTISSA_BITS = 23;
const int F32_EXPONENT_BIAS = -(int)(((1L << F32_BITS) - 1) >> (F32_BITS - (F32_EXPONENT_BITS - 1)));
const int F32_MAX_EXPONENT = F32_EXPONENT_BIAS + ((1 << F32_EXPONENT_BITS) - 1 - 1);
const int F32_SIGNALING_EXPONENT = (F32_MAX_EXPONENT - F32_EXPONENT_BIAS + 1) << F32_MANTISSA_BITS;
const int F32_SHL_LOSE_SIGN = (F32_BITS - (MANTISSA_BITS + EXPONENT_BITS));
const int F32_SHR_PLACE_MANTISSA = MANTISSA_BITS + ((1 + F32_EXPONENT_BITS) - (MANTISSA_BITS + EXPONENT_BITS));
const int F32_MAGIC = (((1 << F32_EXPONENT_BITS) - 1) - (1 + EXPONENT_BITS)) << F32_MANTISSA_BITS;
byte _value;
static quarter Epsilon => new quarter { _value = 1 };
static quarter MaxValue => new quarter { _value = (byte)(SIGNALING_EXPONENT - 1) };
static quarter NaN => new quarter { _value = (byte)(SIGNALING_EXPONENT | 1) };
static quarter PositiveInfinity => new quarter { _value = (byte)SIGNALING_EXPONENT };
static uint asuint(float f) => *(uint*)&f;
static float asfloat(uint u) => *(float*)&u;
static byte tobyte(bool b) => *(byte*)&b;
static float ToFloat(quarter q, bool promiseInRange)
{
uint fusedExponentMantissa = ((uint)q._value << F32_SHL_LOSE_SIGN) >> F32_SHR_PLACE_MANTISSA;
uint sign = ((uint)q._value >> (BITS - 1)) << (F32_BITS - 1);
if (!promiseInRange)
{
bool nanInf = (q._value & SIGNALING_EXPONENT) == SIGNALING_EXPONENT;
uint ifNanInf = asuint(float.PositiveInfinity) & (uint)(-tobyte(nanInf));
return (nanInf ? 1f : asfloat(F32_MAGIC)) * asfloat(sign | fusedExponentMantissa | ifNanInf);
}
else
{
return asfloat(F32_MAGIC) * asfloat(sign | fusedExponentMantissa);
}
}
static quarter ToQuarter(float f, bool promiseInRange)
{
float inRange = f * (1f / asfloat(F32_MAGIC));
uint q = asuint(inRange) >> (F32_MANTISSA_BITS - (1 + EXPONENT_BITS));
uint f8_sign = asuint(f) >> (F32_BITS - 1);
if (!promiseInRange)
{
uint f32_exponent = asuint(f) & F32_SIGNALING_EXPONENT;
bool overflow = f32_exponent > (uint)(-F32_EXPONENT_BIAS + MAX_EXPONENT << F32_MANTISSA_BITS);
bool notNaNInf = f32_exponent != F32_SIGNALING_EXPONENT;
f8_sign ^= tobyte(!notNaNInf);
if (overflow & notNaNInf)
{
q = PositiveInfinity._value;
}
}
f8_sign <<= (BITS - 1);
return new quarter{ _value = (byte)(q ^ f8_sign) };
}
}
Turns out that in fact, the reverse operation of converting the mini-float to a 32 bit float by multiplying with a magic constant is also the reverse operation of a multiplication (wow...): a floating point division by that constant.
Luckily "by that constant" and not the other way around; we can calculate the reciprocal at compile time and multiply by it instead. This only fails, as with the reverse operation, when converting to- and from 'INF' and 'NaN'. Absolute overflow with any biased 32 exponent with exponent % (MAX_EXPONENT + 1) != 0 is not translated into 'INF' and positive 'INF' is translated into negative 'INF'.
Although this enables some optimizations through the bool paramater, this mostly just reduces code size and more importantly (especially for SIMD versions, where small data types really shine) reduces the need for constants. Speaking of SIMD: This scalar version can be optimized a little by using SSE/SSE2 intrinsics.
The (disabled) optimizations (would) run completely in parallel to the floating point multiplication followed by a shift, taking a total of 5 to 6+ clock cycles (very CPU dependant), which is astonishingly close to native hardware instructions (~4 to 5 clock cycles).

Convert floating point byte array to decimal

I receive a 4-byte long array over BLE and I am interested in converting the last two bytes to a floating point decimal representation(float, double, etc) in C#. The original value has the following format:
1 sign bit
4 integer bits
11 fractional bits
My first attempt was using BitConverter, but I am confused by the procedure.
Example: I receive a byte array values where values[2] = 143 and values[3] = 231. Those 2 bytes combined represent a value in the format specified above. I am not sure, but I think that it would be like this:
SIGN INT FRACTION
0 0000 00000000000
Furthermore, since the value comes in two bytes, I tried to use BitConverter.ToString to get the hex representation and then concatenate the bytes. It is at this point where I am not sure how to continue.
Thank you for your help!
Your question lacks information, however the format seems to be clear, e.g. for given two bytes as
byte[] arr = new byte[] { 123, 45 };
we have
01111011 00101101
^^ ^^ .. ^
|| || |
|| |11 fractional bits = 01100101101 (813)
||..|
|4 integer bits = 1111 (15)
|
1 sign bit = 0 (positive)
let's get all the parts:
bool isNegative = (arr[0] & 0b10000000) != 0;
int intPart = (arr[0] & 0b01111000) >> 3;
int fracPart = ((arr[0] & 0b00000111) << 8) | arr[1];
Now we should combine all these parts into a number; here is ambiguity: there are many possible different ways to compute fractional part. One of them is
result = [sign] * (intPart + fracPart / 2**Number_Of_Frac_Bits)
In our case { 123, 45 } we have
sign = 0 (positive)
intPart = 1111b = 15
fracPart = 01100101101b / (2**11) = 813 / 2048 = 0.39697265625
result = 15.39697265625
Implementation:
// double result = (isNegative ? -1 : 1) * (intPart + fracPart / 2048.0);
Single result = (isNegative ? -1 : 1) * (intPart + fracPart / 2048f);
Console.Write(result);
Outcome:
15.39697
Edit: In case you actually don't have sign bit, but use 2's complement for the integer part (which is unusual - double, float use sign bit; 2's complement is for integer types like Int32, Int16 etc.) I expect just an expectation, 2's complement is unusual in the context) something like
int intPart = (arr[0]) >> 3;
int fracPart = ((arr[0] & 0b00000111) << 8) | arr[1];
if (intPart >= 16)
intPart = -((intPart ^ 0b11111) + 1);
Single result = (intPart + fracPart / 2048f);
Another possibility that you in fact use integer value (Int16) with fixed floating point; in this case the conversion is easy:
We obtain Int16 as usual
Then put fixed floating point by dividing 2048.0:
Code:
// -14.01221 for { 143, 231 }
Single result = unchecked((Int16) ((arr[0] << 8) | arr[1])) / 2048f;
Or
Single result = BitConverter.ToInt16(BitConverter.IsLittleEndian
? arr.Reverse().ToArray()
: arr, 0) / 2048f;

Combine 2 numbers in a byte

I have two numbers (going from 0-9) and I want to combine them into 1 byte.
Number 1 would take bit 0-3 and Number 2 has bit 4-7.
Example : I have number 3 and 4.
3 = 0011 and 4 is 0100.
Result should be 0011 0100.
How can I make a byte with these binary values?
This is what I currently have :
public Byte CombinePinDigit(int DigitA, int DigitB)
{
BitArray Digit1 = new BitArray(Convert.ToByte(DigitA));
BitArray Digit2 = new BitArray(Convert.ToByte(DigitB));
BitArray Combined = new BitArray(8);
Combined[0] = Digit1[0];
Combined[1] = Digit1[1];
Combined[2] = Digit1[2];
Combined[3] = Digit1[3];
Combined[4] = Digit2[0];
Combined[5] = Digit2[1];
Combined[6] = Digit2[2];
Combined[7] = Digit2[3];
}
With this code I have ArgumentOutOfBoundsExceptions
Forget all that bitarray stuff.
Just do this:
byte result = (byte)(number1 | (number2 << 4));
And to get them back:
int number1 = result & 0xF;
int number2 = (result >> 4) & 0xF;
This works by using the << and >> bit-shift operators.
When creating the byte, we are shifting number2 left by 4 bits (which fills the lowest 4 bits of the results with 0) and then we use | to or those bits with the unshifted bits of number1.
When restoring the original numbers, we reverse the process. We shift the byte right by 4 bits which puts the original number2 back into its original position and then we use & 0xF to mask off any other bits.
This bits for number1 will already be in the right position (since we never shifted them) so we just need to mask off the other bits, again with & 0xF.
You should verify that the numbers are in the range 0..9 before doing that, or (if you don't care if they're out of range) you can constrain them to 0..15 by anding with 0xF:
byte result = (byte)((number1 & 0xF) | ((number2 & 0xF) << 4));
this should basically work:
byte Pack(int a, int b)
{
return (byte)(a << 4 | b & 0xF);
}
void Unpack(byte val, out int a, out int b)
{
a = val >> 4;
b = val & 0xF;
}

C# Language: Changing the First Four Bits in a Byte

In order to utilize a byte to its fullest potential, I'm attempting to store two unique values into a byte: one in the first four bits and another in the second four bits. However, I've found that, while this practice allows for optimized memory allocation, it makes changing the individual values stored in the byte difficult.
In my code, I want to change the first set of four bits in a byte while maintaining the value of the second four bits in the same byte. While bitwise operations allow me to easily retrieve and manipulate the first four bit values, I'm finding it difficult to concatenate this new value with the second set of four bits in a byte. The question is, how can I erase the first four bits from a byte (or, more accurately, set them all the zero) and add the new set of 4 bits to replace the four bits that were just erased, thus preserving the last 4 bits in a byte while changing the first four?
Here's an example:
// Changes the first four bits in a byte to the parameter value
public void changeFirstFourBits(byte newFirstFour)
{
// If 'newFirstFour' is 0101 in binary, make 'value' 01011111 in binary, changing
// the first four bits but leaving the second four alone.
}
private byte value = 255; // binary: 11111111
Use bitwise AND (&) to clear out the old bits, shift the new bits to the correct position and bitwise OR (|) them together:
value = (value & 0xF) | (newFirstFour << 4);
Here's what happens:
value : abcdefgh
newFirstFour : 0000xyzw
0xF : 00001111
value & 0xF : 0000efgh
newFirstFour << 4 : xyzw0000
(value & 0xF) | (newFirstFour << 4) : xyzwefgh
When I have to do bit-twiddling like this, I make a readonly struct to do it for me. A four-bit integer is called nybble, of course:
struct TwoNybbles
{
private readonly byte b;
public byte High { get { return (byte)(b >> 4); } }
public byte Low { get { return (byte)(b & 0x0F); } {
public TwoNybbles(byte high, byte low)
{
this.b = (byte)((high << 4) | (low & 0x0F));
}
And then add implicit conversions between TwoNybbles and byte. Now you can just treat any byte as having a High and Low byte without putting all that ugly bit twiddling in your mainline code.
You first mask out you the high four bytes using value & 0xF. Then you shift the new bits to the high four bits using newFirstFour << 4 and finally you combine them together using binary or.
public void changeHighFourBits(byte newHighFour)
{
value=(byte)( (value & 0x0F) | (newFirstFour << 4));
}
public void changeLowFourBits(byte newLowFour)
{
value=(byte)( (value & 0xF0) | newLowFour);
}
I'm not really sure what your method there is supposed to do, but here are some methods for you:
void setHigh(ref byte b, byte val) {
b = (b & 0xf) | (val << 4);
}
byte high(byte b) {
return (b & 0xf0) >> 4;
}
void setLow(ref byte b, byte val) {
b = (b & 0xf0) | val;
}
byte low(byte b) {
return b & 0xf;
}
Should be self-explanatory.
public int SplatBit(int Reg, int Val, int ValLen, int Pos)
{
int mask = ((1 << ValLen) - 1) << Pos;
int newv = Val << Pos;
int res = (Reg & ~mask) | newv;
return res;
}
Example:
Reg = 135
Val = 9 (ValLen = 4, because 9 = 1001)
Pos = 2
135 = 10000111
9 = 1001
9 << Pos = 100100
Result = 10100111
A quick look would indicate that a bitwise and can be achieved using the & operator. So to remove the first four bytes you should be able to do:
byte value1=255; //11111111
byte value2=15; //00001111
return value1&value2;
Assuming newVal contains the value you want to store in origVal.
Do this for the 4 least significant bits:
byte origVal = ???;
byte newVal = ???
orig = (origVal & 0xF0) + newVal;
and this for the 4 most significant bits:
byte origVal = ???;
byte newVal = ???
orig = (origVal & 0xF) + (newVal << 4);
I know you asked specifically about clearing out the first four bits, which has been answered several times, but I wanted to point out that if you have two values <= decimal 15, you can combine them into 8 bits simply with this:
public int setBits(int upperFour, int lowerFour)
{
return upperFour << 4 | lowerFour;
}
The result will be xxxxyyyy where
xxxx = upperFour
yyyy = lowerFour
And that is what you seem to be trying to do.
Here's some code, but I think the earlier answers will do it for you. This is just to show some sort of test code to copy and past into a simple console project (the WriteBits method by be of help):
static void Main(string[] args)
{
int b1 = 255;
WriteBits(b1);
int b2 = b1 >> 4;
WriteBits(b2);
int b3 = b1 & ~0xF ;
WriteBits(b3);
// Store 5 in first nibble
int b4 = 5 << 4;
WriteBits(b4);
// Store 8 in second nibble
int b5 = 8;
WriteBits(b5);
// Store 5 and 8 in first and second nibbles
int b6 = 0;
b6 |= (5 << 4) + 8;
WriteBits(b6);
// Store 2 and 4
int b7 = 0;
b7 = StoreFirstNibble(2, b7);
b7 = StoreSecondNibble(4, b7);
WriteBits(b7);
// Read First Nibble
int first = ReadFirstNibble(b7);
WriteBits(first);
// Read Second Nibble
int second = ReadSecondNibble(b7);
WriteBits(second);
}
static int ReadFirstNibble(int storage)
{
return storage >> 4;
}
static int ReadSecondNibble(int storage)
{
return storage &= 0xF;
}
static int StoreFirstNibble(int val, int storage)
{
return storage |= (val << 4);
}
static int StoreSecondNibble(int val, int storage)
{
return storage |= val;
}
static void WriteBits(int b)
{
Console.WriteLine(BitConverter.ToString(BitConverter.GetBytes(b),0));
}
}

Categories