How do I print the exact value stored in a float? - c#

If I assign the value 0.1 to a float:
float f = 0.1;
The actual value stored in memory is not an exact representation of 0.1, because 0.1 is not a number that can be exactly represented in single-precision floating-point format. The actual value stored - if I did my maths correctly - is
0.100000001490116119384765625
But I can't identify a way to get C# to print out that value. Even if I ask it to print the number to a great many decimal places, it doesn't give the correct answer:
// prints 0.10000000000000000000000000000000000000000000000000
Console.WriteLine(f.ToString("F50"));
How can I print the exact value stored in a float; the value actually represented by the bit-pattern in memory?
EDIT: It has been brought to my attention elsewhere that you can get the behaviour I ask for using standard format strings... on .NET Core and .NET 5.0. So this question is .NET Framework specific, I guess.

Oops, this answer relates to C, not C#.
Leaving it up as it may provide C# insight as the languages have similarities concerning this.
How do I print the exact value stored in a float?
// Print exact value with a hexadecimal significant.
printf("%a\n", some_float);
// e.g. 0x1.99999ap-4 for 0.1f
To print the value of a float in decimal with sufficient distinctive decimal places from all other float:
int digits_after_the_decimal_point = FLT_DECIMAL_DIG - 1; // e.g. 9 -1
printf("%.*e\n", digits_after_the_decimal_point, some_float);
// e.g. 1.00000001e-01 for 0.1f
To print the value in decimal with all decimal places places is hard - and rarely needed. Code could use a greater precision. Past a certain point (e.g. 20 significant digits), big_value may lose correctness in the lower digits with printf(). This incorrectness is allowed in C and IEEE 754:
int big_value = 19; // More may be a problem.
printf("%.*e\n", big_value, some_float);
// e.g. 1.0000000149011611938e-01 for 0.1f
// for FLT_TRUE_MIN and big_value = 50, not quite right
// e.g. 1.40129846432481707092372958328991613128026200000000e-45
To print the value in decimal with all decimal places places for all float, write a helper function. Example.
// Using custom code
// -FLT_TRUE_MIN
-0.00000000000000000000000000000000000000000000140129846432481707092372958328991613128026194187651577175706828388979108268586060148663818836212158203125

For .NET Framework, use format string G. Not exactly but enough for the float errors.
> (0.3d).ToString("G70")
0.29999999999999999
> (0.1d+0.2d).ToString("G70")
0.30000000000000004
Down voted... Fine, I find dmath, a math library for it.
> new Deveel.Math.BigDecimal(0.3d).ToString()
0.299999999999999988897769753748434595763683319091796875

The basic idea here is to convert the float value into a rational value, and then convert the rational into a decimal.
The following code (for .Net 6, which provides the BitConverter.SingleToUInt32Bits method) will print the exact value of a float (including whether a NaN value is quiet/signalling, the payload of the NaN and whether the sign bit is set). Note that the WriteRational method is not generally-applicable to all rationals as it makes no attempt to detect non-terminating decimal representations: this is not an issue here since all values in a float have power-of-two denominators.
using System; // not necessary with implicit usings
using System.Globalization;
using System.Numerics;
using System.Text;
static string ExactStringSingle(float value)
{
const int valueBits = sizeof(float) * 8;
const int fractionBits = 23; // excludes implicit leading 1 in normal values
const int exponentBits = valueBits - fractionBits - 1;
const uint signMask = 1U << (valueBits - 1);
const uint fractionMask = (1U << fractionBits) - 1;
var bits = BitConverter.SingleToUInt32Bits(value);
var result = new StringBuilder();
if ((bits & signMask) != 0) { result.Append('-'); }
var biasedExponent = (int)((bits & ~signMask) >> fractionBits);
var fraction = bits & fractionMask;
// Maximum possible value of the biased exponent: infinities and NaNs
const int maxExponent = (1 << exponentBits) - 1;
if (biasedExponent == maxExponent)
{
if (fraction == 0)
{
result.Append("inf");
}
else
{
// NaN type is stored in the most significant bit of the fraction
const uint nanTypeMask = 1U << (fractionBits - 1);
// NaN payload
const uint nanPayloadMask = nanTypeMask - 1;
// NaN type, valid for x86, x86-64, 68000, ARM, SPARC
var isQuiet = (fraction & nanTypeMask) != 0;
var nanPayload = fraction & nanPayloadMask;
result.Append(isQuiet
? FormattableString.Invariant($"qNaN(0x{nanPayload:x})")
: FormattableString.Invariant($"sNaN(0x{nanPayload:x})"));
}
return result.ToString();
}
// Minimum value of biased exponent above which no fractional part will exist
const int noFractionThreshold = (1 << (exponentBits - 1)) + fractionBits - 1;
if (biasedExponent == 0)
{
// zeroes and subnormal numbers
// shift for the denominator of the rational part of a subnormal number
const int denormalDenominatorShift = noFractionThreshold - 1;
WriteRational(fraction, BigInteger.One << denormalDenominatorShift, result);
return result.ToString();
}
// implicit leading one in the fraction part
const uint implicitLeadingOne = 1U << fractionBits;
var numerator = (BigInteger)(fraction | implicitLeadingOne);
if (biasedExponent >= noFractionThreshold)
{
numerator <<= biasedExponent - noFractionThreshold;
result.Append(numerator.ToString(CultureInfo.InvariantCulture));
}
else
{
var denominator = BigInteger.One << (noFractionThreshold - (int)biasedExponent);
WriteRational(numerator, denominator, result);
}
return result.ToString();
}
static void WriteRational(BigInteger numerator, BigInteger denominator, StringBuilder result)
{
// precondition: denominator contains only factors of 2 and 5
var intPart = BigInteger.DivRem(numerator, denominator, out numerator);
result.Append(intPart.ToString(CultureInfo.InvariantCulture));
if (numerator.IsZero) { return; }
result.Append('.');
do
{
numerator *= 10;
var gcd = BigInteger.GreatestCommonDivisor(numerator, denominator);
denominator /= gcd;
intPart = BigInteger.DivRem(numerator / gcd, denominator, out numerator);
result.Append(intPart.ToString(CultureInfo.InvariantCulture));
} while (!numerator.IsZero);
}
I've written most of the constants in the code in terms of valueBits and fractionBits (defined in the first lines of the method), in order to make it as straightforward as possible to adapt this method for doubles. To do this:
Change valueBits to sizeof(double) * 8
Change fractionBits to 52
Change all uints to ulongs (including converting 1U to 1UL)
Call BitConverter.DoubleToUInt64Bits instead of BitConverter.SingleToUInt32Bits
Making this code culture-aware is left as an exercise for the reader :-)

Related

Dividing two numbers always returns 0 [duplicate]

How come dividing two 32 bit int numbers as ( int / int ) returns to me 0, but if I use Decimal.Divide() I get the correct answer? I'm by no means a c# guy.
int is an integer type; dividing two ints performs an integer division, i.e. the fractional part is truncated since it can't be stored in the result type (also int!). Decimal, by contrast, has got a fractional part. By invoking Decimal.Divide, your int arguments get implicitly converted to Decimals.
You can enforce non-integer division on int arguments by explicitly casting at least one of the arguments to a floating-point type, e.g.:
int a = 42;
int b = 23;
double result = (double)a / b;
In the first case, you're doing integer division, so the result is truncated (the decimal part is chopped off) and an integer is returned.
In the second case, the ints are converted to decimals first, and the result is a decimal. Hence they are not truncated and you get the correct result.
The following line:
int a = 1, b = 2;
object result = a / b;
...will be performed using integer arithmetic. Decimal.Divide on the other hand takes two parameters of the type Decimal, so the division will be performed on decimal values rather than integer values. That is equivalent of this:
int a = 1, b = 2;
object result = (Decimal)a / (Decimal)b;
To examine this, you can add the following code lines after each of the above examples:
Console.WriteLine(result.ToString());
Console.WriteLine(result.GetType().ToString());
The output in the first case will be
0
System.Int32
..and in the second case:
0,5
System.Decimal
I reckon Decimal.Divide(decimal, decimal) implicitly converts its 2 int arguments to decimals before returning a decimal value (precise) where as 4/5 is treated as integer division and returns 0
You want to cast the numbers:
double c = (double)a/(double)b;
Note: If any of the arguments in C# is a double, a double divide is used which results in a double. So, the following would work too:
double c = (double)a/b;
here is a Small Program :
static void Main(string[] args)
{
int a=0, b = 0, c = 0;
int n = Convert.ToInt16(Console.ReadLine());
string[] arr_temp = Console.ReadLine().Split(' ');
int[] arr = Array.ConvertAll(arr_temp, Int32.Parse);
foreach (int i in arr)
{
if (i > 0) a++;
else if (i < 0) b++;
else c++;
}
Console.WriteLine("{0}", (double)a / n);
Console.WriteLine("{0}", (double)b / n);
Console.WriteLine("{0}", (double)c / n);
Console.ReadKey();
}
In my case nothing worked above.
what I want to do is divide 278 by 575 and multiply by 100 to find percentage.
double p = (double)((PeopleCount * 1.0 / AllPeopleCount * 1.0) * 100.0);
%: 48,3478260869565 --> 278 / 575 ---> 0
%: 51,6521739130435 --> 297 / 575 ---> 0
if I multiply the PeopleCount by 1.0 it makes it decimal and division will be 48.34...
also multiply by 100.0 not 100.
If you are looking for 0 < a < 1 answer, int / int will not suffice. int / int does integer division. Try casting one of the int's to a double inside the operation.
The answer marked as such is very nearly there, but I think it is worth adding that there is a difference between using double and decimal.
I would not do a better job explaining the concepts than Wikipedia, so I will just provide the pointers:
floating-point arithmetic
decimal data type
In financial systems, it is often a requirement that we can guarantee a certain number of (base-10) decimal places accuracy. This is generally impossible if the input/source data is in base-10 but we perform the arithmetic in base-2 (because the number of decimal places required for the decimal expansion of a number depends on the base; one third takes infinitely many decimal places to express in base-10 as 0.333333..., but it takes only one decimal in base-3: 0.1).
Floating-point numbers are faster to work with (in terms of CPU time; programming-wise they are equally simple) and preferred whenever you want to minimize rounding error (as in scientific applications).

How can I quickly and accurately multiply a 64-bit integer by a 64-bit fraction in C#?

There are a lot of similar questions asked on SO, but I've yet to find one that works and is easily portable to C#. Most involve C++ or similar, and the (presumably) working answers rely on either embedded assembly or native C/C++ functions that don't exist in C#. Several functions work for part of the range, but fail at other parts. I found one working answer I was able to port to C#, but it was very slow (turns out it's decently-fast when I compile to x64 instead of x86, so I posted it as the answer to beat).
Problem
I need a function that allows me to multiply any 64-bit integer by a fraction 0 to 1 (or -1 to 1) that is derived from two 64-bit integers. Ideally, the answer would work for both Int64 and UInt64, but it's probably not hard to make one work from the other.
In my case, I have a random 64-bit Int64/UInt64 (using the xoshiro256p algorithm, though that's likely irrelevant). I want to scale that number to any arbitrary range in the type's allowed values. For example, I might want to scale Int64 to the range [1000, 35000]. This is, conceptually, easy enough:
UInt64 minVal = 1000;
UInt64 maxVal = 35000;
UInt64 maxInt = UInt64.MaxValue;
UInt64 randInt = NextUInt64(); // Random value between 0 and maxInt.
UInt64 diff = maxVal - minVal + 1;
UInt64 scaledInt = randInt * diff / maxInt; // This line can overflow.
return scaledInt + minVal;
As noted by many other people, and the comment above, the problem is that randInt * diff can potentially overflow.
On paper, I could simply store that intermediate result in a 128-bit integer, then store the result of the division in the 64-bit output. But 128-bit math isn't native to 64-bit systems, and I'd rather avoid arbitrary-precision libraries since I'll be making lots of calls to this function and efficiency will be notable.
I could multiply by a double to get 53 bits of precision, which is fine for what I'm currently doing, but I'd rather come up with a proper solution.
I could create a C++ library with one of the ASM solutions and call that library, but I'd like something that's pure C#.
Requirements
Needs to be pure C#.
Needs to work for any set of inputs such that randInt * diff / maxInt is in the range [0, maxInt] (and each value itself is in the same range).
Shouldn't require an external library.
Needs to be +-1 from the mathematically-correct answer.
Needs to be reasonably quick. Maybe I'm just asking for miracles, but I feel like if doubles can do 5-10 ms, we should be able to hit 20 ms with purpose-built code that gets another 11 bits of precision.
Ideally works relatively well in both release and debug modes. My code has about a 3:1 ratio, so I'd think we could get debug under 5-ish times the release time.
My Testing
I've tested the following solutions for relative performance. Each test ran 1 million iterations of my random number generator, scaling using various methods. I started by generating random numbers and putting them in lists (one for signed, one for unsigned). Then I ran through each list and scaled it into a second list.
I initially had a bunch of tests in debug mode. It mostly didn't matter (we're testing relative performance), but the Int128/UInt128 libraries fared much better in release mode.
Numbers in parenthesis are the debug time. I include them here because I still want decent performance while debugging. The Int128 library, for example, is great for release mode, but terrible for debug. It might be useful to use something that has a better balance until you're ready for final release. Because I'm testing a million samples, the time in milliseconds is also the time in nanoseconds per operation (all million UInt64s get generated in 33 ms, so each one is generated in 33 ns).
Source code for my testing can be found here, on GitGub.
86 ms (267): the Int64 random generator.
33 ms (80): the UInt64 random generator.
4 ms (5): using double conversion to Int64, with reduced precision.
8 ms (10): again for UInt64.
76 ms (197): this C Code for Int64, converted to C# (exact code in my answer below).
72 ms (187): again for UInt64.
54 ms (1458): this UInt128 library, for Int64.
40 ms (1476): again for UInt64.
1446 ms (1455): double128 library for Int64. Requires a paid license for commercial use.
1374 ms (1397): again for UInt64.
I couldn't get these to give proper results.
this MulDiv64 library, linked to the main application with DllImport.
QPFloat, compiled to x64, created a MulDiv64 function in the C++ code.
this Java code.
the MFllMulDiv function from the Microsoft Media Foundation library. I tried to test it, but couldn't figure out how to get VS to link into my C++ project properly.
Similar Questions
Most accurate way to do a combined multiply-and-divide operation in 64-bit?
Answers by phuclv, Soonts, Mysticial, and 500 - Internal Server Error involve external libraries, assembly, or MSVC-specific functions.
Answers by timos, AnT, Alexey Frunze, and Michael Burr don't actually answer anything.
Answers by Serge Rogatch and Pubby aren't precise.
Answer by AProgrammer works, but is very slow (and I have no idea how it works) -- I ended up using it anyways and getting decent results in x64 compilation.
How can I descale x by n/d, when x*n overflows?
The only answer, by Abhay Aravinda, isn't real code, I wasn't sure how to implement the last section, and the comments suggest it can overflow for large values anyways.
Fast method to multiply integer by proper fraction without floats or overflow
Answers by Taron and chux - Reinstate Monica are approximations or MSVC-specific.
Answer by R.. GitHub STOP HELPING ICE just uses 64-bit math since that question is about multiplying Int32.
(a * b) / c MulDiv and dealing with overflow from intermediate multiplication
Answer by Jeff Penfold didn't work for me (I think I'm missing something in the logical operators converting from Java to C#), and it was very slow.
Answer by greybeard looks nice, but I wasn't sure how to translate it to C#.
Answers by tohoho and Dave overflow.
Answer by David Eisenstat requires BigInt libraries.
How to multiply a 64 bit integer by a fraction in C++ while minimizing error?
All the answers overflow in different circumstances.
But 128-bit math isn't native to 64-bit systems
While that is mostly true, there is a decent way to get the full 128-bit product of two 64-bit integers: Math.BigMul (for .NET 5 and later)
x64 has a corresponding division with a 128-bit input, and such a pair of full-multiply followed by a wide-division would implement this "scale integer by a proper fraction" operation (with the limitation that the fraction must not be greater than 1, otherwise an overflow could result). However, C# doesn't have access to wide division, and even if it did, it wouldn't be very efficient on most hardware.
But you can just use BigMul directly too, because the divisor should really be 264 to begin with (not 264 - 1), and BigMul automatically divides by 264.
So the code becomes: (not tested)
ulong ignore;
ulong scaled = Math.BigMul(randInt, diff, out ignore);
return scaled + minVal;
For older versions of .NET, getting the high 64 bits of the product could be done like this:
static ulong High64BitsOfProduct(ulong a, ulong b)
{
// decompose into 32bit blocks (in ulong to avoid casts later)
ulong al = (uint)a;
ulong ah = a >> 32;
ulong bl = (uint)b;
ulong bh = b >> 32;
// low times low and high times high
ulong l = al * bl;
ulong h = ah * bh;
// cross terms
ulong x1 = al * bh;
ulong x2 = ah * bl;
// carry from low half of product into high half
ulong carry = ((l >> 32) + (uint)x1 + (uint)x2) >> 32;
// add up all the parts
return h + (x1 >> 32) + (x2 >> 32) + carry;
}
Unfortunately that's not as good as Math.BigMul, but at least there is still no division.
I was able to get down to about 250 ms using AProgrammer's C code by telling the compiler to NOT prefer 32-bit code using the AnyCpu setup.
In release mode, the PRNG takes up about 5 ms (I somewhat doubt this; I think it's being optimized out when I try to just run the PRNG), and the total is down to about 77ms.
I'm still not sure how it works, but the linked answer says the code has some redundant operations for base 10 support. I'm thinking I can reduce the time even further by optimizing out the base 10 support, if I knew how it worked enough to do that.
The Int64 (signed) is a little slower (78 vs 77ms release, about 20ms slower debug), but I'm basically the same speed. It does fail if min=Int64.MinValue and max=Int64.MaxValue, returning min every time, but works for every other combination I could throw at it.
The signed math is less useful for straight scaling. I just made something that worked in my use case. So I made a conversion that seems to work for the general signed case, but it could probably be optimized a bit.
Unsigned scaling algorithm, converted to C#.
/// <summary>
/// Returns an accurate, 64-bit result from value * multiplier / divisor without overflow.
/// From https://stackoverflow.com/a/8757419/5313933
/// </summary>
/// <param name="value">The starting value.</param>
/// <param name="multiplier">The number to multiply by.</param>
/// <param name="divisor">The number to divide by.</param>
/// <returns>The result of value * multiplier / divisor.</returns>
private UInt64 MulDiv64U(UInt64 value, UInt64 multiplier, UInt64 divisor)
{
UInt64 baseVal = 1UL << 32;
UInt64 maxdiv = (baseVal - 1) * baseVal + (baseVal - 1);
// First get the easy thing
UInt64 res = (value / divisor) * multiplier + (value % divisor) * (multiplier / divisor);
value %= divisor;
multiplier %= divisor;
// Are we done?
if (value == 0 || multiplier == 0)
return res;
// Is it easy to compute what remain to be added?
if (divisor < baseVal)
return res + (value * multiplier / divisor);
// Now 0 < a < c, 0 < b < c, c >= 1ULL
// Normalize
UInt64 norm = maxdiv / divisor;
divisor *= norm;
value *= norm;
// split into 2 digits
UInt64 ah = value / baseVal, al = value % baseVal;
UInt64 bh = multiplier / baseVal, bl = multiplier % baseVal;
UInt64 ch = divisor / baseVal, cl = divisor % baseVal;
// compute the product
UInt64 p0 = al * bl;
UInt64 p1 = p0 / baseVal + al * bh;
p0 %= baseVal;
UInt64 p2 = p1 / baseVal + ah * bh;
p1 = (p1 % baseVal) + ah * bl;
p2 += p1 / baseVal;
p1 %= baseVal;
// p2 holds 2 digits, p1 and p0 one
// first digit is easy, not null only in case of overflow
UInt64 q2 = p2 / divisor;
p2 = p2 % divisor;
// second digit, estimate
UInt64 q1 = p2 / ch;
// and now adjust
UInt64 rhat = p2 % ch;
// the loop can be unrolled, it will be executed at most twice for
// even baseVals -- three times for odd one -- due to the normalisation above
while (q1 >= baseVal || (rhat < baseVal && q1 * cl > rhat * baseVal + p1))
{
q1--;
rhat += ch;
}
// subtract
p1 = ((p2 % baseVal) * baseVal + p1) - q1 * cl;
p2 = (p2 / baseVal * baseVal + p1 / baseVal) - q1 * ch;
p1 = p1 % baseVal + (p2 % baseVal) * baseVal;
// now p1 hold 2 digits, p0 one and p2 is to be ignored
UInt64 q0 = p1 / ch;
rhat = p1 % ch;
while (q0 >= baseVal || (rhat < baseVal && q0 * cl > rhat * baseVal + p0))
{
q0--;
rhat += ch;
}
// we don't need to do the subtraction (needed only to get the remainder,
// in which case we have to divide it by norm)
return res + q0 + q1 * baseVal; // + q2 *baseVal*baseVal
}
MulDiv64 uses the unsigned version to get a signed conversion. It's slower in my use case (290ms vs 260ms debug, 95ms vs 81ms release), but works for the general case. Doesn't work for Int64.MinValue (raises an exception: "Negating the minimum value of a twos complement number is invalid.").
public static Int64 MulDiv64(Int64 value, Int64 multiplier, Int64 divisor)
{
// Get the signs then convert to positive values.
bool isPositive = true;
if (value < 0) isPositive = !isPositive;
UInt64 val = (UInt64)Math.Abs(value);
if (multiplier < 0) isPositive = !isPositive;
UInt64 mult = (UInt64)Math.Abs(multiplier);
if (divisor < 0) isPositive = !isPositive;
UInt64 div = (UInt64)Math.Abs(divisor);
// Scaledown.
UInt64 scaledVal = MulDiv64U(val, mult, div);
// Convert to signed Int64.
Int64 result = (Int64)scaledVal;
if (!isPositive) result *= -1;
// Finished.
return result;
}
GetRangeU function returns an unsigned UInt64 between min and max, inclusive. Scaling is straight from the earlier function.
/// <summary>
/// Returns a random unsigned integer between Min and Max, inclusive.
/// </summary>
/// <param name="min">The minimum value that may be returned.</param>
/// <param name="max">The maximum value that may be returned.</param>
/// <returns>The random value selected by the Fates for your application's immediate needs. Or their fickle whims.</returns>
public UInt64 GetRangeU(UInt64 min, UInt64 max)
{
// Swap inputs if they're in the wrong order.
if (min > max)
{
UInt64 Temp = min;
min = max;
max = Temp;
}
// Get a random integer.
UInt64 randInt = NextUInt64();
// Fraction randInt/MaxValue needs to be strictly less than 1.
if (randInt == UInt64.MaxValue) randInt = 0;
// Get the difference between min and max values.
UInt64 diff = max - min + 1;
// Scale randInt from the range 0, maxInt to the range 0, diff.
randInt = MulDiv64U(diff, randInt, UInt64.MaxValue);
// Add the minimum value and return the result.
return randInt;// randInt + min;
}
GetRange function returns a signed Int64 between min and max. Not easily convertible to general scaling, but it's faster than the method above in this case.
/// <summary>
/// Returns a random signed integer between Min and Max, inclusive.
/// Returns min if min is Int64.MinValue and max is Int64.MaxValue.
/// </summary>
/// <param name="min">The minimum value that may be returned.</param>
/// <param name="max">The maximum value that may be returned.</param>
/// <returns>The random value selected.</returns>
public Int64 GetRange(Int64 min, Int64 max)
{
// Swap inputs if they're in the wrong order.
if (min > max)
{
Int64 Temp = min;
min = max;
max = Temp;
}
// Get a random integer.
UInt64 randInt = NextUInt64();
// Fraction randInt/MaxValue needs to be strictly less than 1.
if (randInt == UInt64.MaxValue) randInt = 0;
// Get the difference between min and max values.
UInt64 diff = (UInt64)(max - min) + 1;
// Scale randInt from the range 0, maxInt to the range 0, diff.
randInt = MulDiv64U(diff, randInt, UInt64.MaxValue);
// Convert to signed Int64.
UInt64 randRem = randInt % 2;
randInt /= 2;
Int64 result = min + (Int64)randInt + (Int64)randInt + (Int64)randRem;
// Finished.
return result;
}

C# precise float variable

I have written one program,it use float variables and after calculate some functions it doesn’t have sufficient precise result.
my program is
int x=0;
ts_sec = 1338526801;
ts_usec = 113676;
while(ir<2)
{
Console.Write("ts_sec:" + ts_sec+"\t");
Console.Write("ts_usec:" + ts_usec+"\t");
if (x == 0)
{
floatstart = ts_sec + ts_usec / 1000000;
Console.Write("startTime" + floatstart+"\t");
x=x+1; //x=1
}
timeStamp = ts_sec + ts_usec / 1000000 - floatstart; //
Console.Write("timestamp is" + timeStamp+"\n");
ts_sec = 1338526801;
ts_usec = 113676;
ir++;
}
this is my out put:
ts_sec:1338526801 ts_usec:113676 startTime1.338527E+09
timestamp is0 ts_sec:1338526801 ts_usec:113678
timestamp is0
but I want my output will be like this and I want my result doesn’t have E .
ts_sec:1338526801 ts_usec:113676 startTime:1338526801.11368
timeStamp:0 ts_sec:1338526801 ts_usec:113678
timeStamp:1.9073486328125e-006
First, change float (32-bit) to double (64-bit) to increase precision nearly two times; second if you don't want the result being in scientific representation (i.e. with "e") choose appropriate formatting:
Double startTime = ...
// Assuming that you want 5 digits after the decimal point
String result = startTime.ToString("F5");
If you want more precision use decimal.
The decimal keyword indicates a 128-bit data type. Compared to floating-point types, the decimal type has more precision and a smaller range, which makes it appropriate for financial and monetary calculations.

Formatting String to 1 Decimal Place, without rounding up/down

I want a function that is passed an integer, and if that integer is over a certain value (1000 in my current case) I want to perform some division on it so I can ultimately return an abbreviation of the original integer.
For example:
1000 = 1
1001 = 1
1099 = 1
1100 = 1.1
1199 = 1.1
1200 = 1.2
10000 = 10
10099 = 10
10100 = 10.1
It's the division and rounding side of things that has been causing me problems.
What would be the most suitable method to give me the results above?
How about:
int dividedBy100 = x / 100; // Deliberately an integer division
decimal dividedBy1000 = dividedBy100 / 10m; // Decimal division
string result = dividedBy1000.ToString();
I would advise using decimal here rather than float or double, as you fundamentally want decimal division by 10.
An abbreviation is by definition rounded.
If you are looking for more precision why not use Double instead of Integer?
Here's a suggestion
double function(int number)
{
if (number >= 1000)
{
return (((double)number) / 1000);
}
}
Your examples seem to imply that you only want one decimal place of precision, so how about something like this:
Divide by 100
Cast to double (or float)
Divide by 10
The first division will truncate any trailing numbers less than 100 (the equivalent of a 100-base floor function), then casting to double and dividing by 10 will give you the single decimal place of precision you want.
if t is the original number, then
int a=t/100
float b=a/10
b should contain your answer
Some more code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
while (true)
{
string s;
s = Console.ReadLine();
int a = Convert.ToInt32(s);
a = a / 100;
float b = a / (float)10.0;
Console.WriteLine(b);
}
}
}
}
You should use modular (remainder) mathematics to do this. You don't need to involve the FPU (Floating Point Unit).
static string RoundAndToString(int value, int denominator)
{
var remainder = value % denominator;
value = (value - remainder) / denominator;
if (remainder == 0)
return value.ToString();
remainder = (remainder * 10) / denominator;
return string.Format("{0}{1}{2}", value, CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator, remainder);
}
Since you just want to truncate the number, it makes sense to convert it to a string, remove the last two characters from the string, then divide by 10 to get the corresponding number.
Here is the algorithm in Ruby. (I don't have C# handy)
a = 1000 #sample number
-> 1000
b = a.to_s[0..-3] #a converted to a string, then taking all characters except the last two.
-> "10"
c = b.to_i / 10.0 # converts to float in correct power
-> 1.0
You then display "c" in whatever format you want using sprintf (or the C# equivalent using FormatNumber).
try
int MyNumber = 10100;
string MyString = ((int) MyNumber/1000).ToString() + (( MyNumber % 1000) > 99 ? "." + (((int)( MyNumber / 100 )) % 10).ToString() : "");

How to make my extended range floating point multiply more efficient?

I am doing a calculation which frequently involves values like 3.47493E+17298. This is way beyond what a double can handle, and I don't need extra precision, just extra range of exponents, so I created my own little struct in C#.
My struct uses a long for significand and sign, and an int for exponent, so I effectively have:
1 sign bit
32 exponent bits (regular 2's complement exponent)
63 significand bits
I am curious what steps could be made to make my multiplication routine more efficient. I am running an enormous number of multiplications of these extended range values, and it is pretty fast, but I was looking for hints as to making it faster.
My multiplication routine:
public static BigFloat Multiply(BigFloat left, BigFloat right)
{
long shsign1;
long shsign2;
if (left.significand == 0)
{
return bigZero;
}
if (right.significand == 0)
{
return bigZero;
}
shsign1 = left.significand;
shsign2 = right.significand;
// scaling down significand to prevent overflow multiply
// s1 and s2 indicate how much the left and right
// significands need shifting.
// The multLimit is a long constant indicating the
// max value I want either significand to be
int s1 = qshift(shsign1, multLimit);
int s2 = qshift(shsign2, multLimit);
shsign1 >>= s1;
shsign2 >>= s2;
BigFloat r;
r.significand = shsign1 * shsign2;
r.exponent = left.exponent + right.exponent + s1 + s2;
return r;
}
And the qshift:
It just finds out how much to shift the val to make it smaller in absolute value than the limit.
public static int qshift(long val, long limit)
{
long q = val;
long c = limit;
long nc = -limit;
int counter = 0;
while (q > c || q < nc)
{
q >>= 1;
counter++;
}
return counter;
}
Here is a completely different idea...
Use the hardware's floating-point machinery, but augment it with your own integer exponents. Put another way, make BigFloat.significand be a floating-point number instead of an integer.
Then you can use ldexp and frexp to keep the actual exponent on the float equal to zero. These should be single machine instructions.
So BigFloat multiply becomes:
r.significand = left.significand * right.significand
r.exponent = left.exponent + right.exponent
tmp = (actual exponent of r.significand from frexp)
r.exponent += tmp
(use ldexp to subtract tmp from actual exponent of r.significand)
Unfortunately,the last two steps require frexp and ldexp, which searches suggest are not available in C#. So you might have to write this bit in C.
...
Or, actually...
Use floating-point numbers for the significands, but just keep them normalized between 1 and 2. So again, use floats for the significands, and multiply like this:
r.significand = left.significand * right.significand;
r.exponent = left.exponent + right.exponent;
if (r.significand >= 2) {
r.significand /= 2;
r.exponent += 1;
}
assert (r.significand >= 1 && r.significand < 2); // for debugging...
This should work as long as you maintain the invariant mentioned in the assert(). (Because if x is between 1 and 2 and y is between 1 and 2 then x*y is between 1 and 4, so the normalization step is just has to check for when the significand product is between 2 and 4.)
You will also need to normalize the results of additions etc., but I suspect you are already doing that.
Although you will need to special-case zero after all :-).
[edit, to flesh out the frexp version]
BigFloat BigFloat::normalize(BigFloat b)
{
double temp = b.significand;
double tempexp = b.exponent;
double temp2, tempexp2;
temp2 = frexp(temp, &tempexp2);
// Need to test temp2 for infinity and NaN here
tempexp += tempexp2;
if (tempexp < MIN_EXP)
// underflow!
if (tempexp > MAX_EXP)
// overflow!
BigFloat r;
r.exponent = tempexp;
r.significand = temp2;
}
In other words, I would suggest factoring this out as a "normalize" routine, since presumably you want to use it following additions, subtractions, multiplications, and divisions.
And then there are all the corner cases to worry about...
You probably want to handle underflow by returning zero. Overflow depends on your tastes; should either be an error or +-infinity. Finally, if the result of frexp() is infinity or NaN, the value of tempexp2 is undefined, so you might want to check those cases, too.
I am not much of a C# programmer, but here are some general ideas.
First, are there any profiling tools for C#? If so, start with those...
The time is very likely being spent in your qshift() function; in particular, the loop. Mispredicted branches are nasty.
I would rewrite it as:
long q = abs(val);
int x = q/nc;
(find next power of 2 bigger than x)
For that last step, see this question and answer.
Then instead of shifting by qshift, just divide by this power of 2. (Does C# have "find first set" (aka. ffs)? If so, you can use it to get the shift count from the power of 2; it should be one instruction.)
Definitely inline this sequence if the compiler will not do it for you.
Also, I would ditch the special cases for zero, unless you are multiplying by zero a lot. Linear code good; conditionals bad.
If you're sure there won't be an overflow, you can use an unchecked block.
That will remove the overflow checks, and give you a bit more performance.

Categories