Can C#/.net parse exponential hex encoded floating point numbers from strings? - c#

C++, Java all include the [-]0xh.hhhhp+/-d format in the syntax of the language, other languages like python and C99 have library support for parsing these strings (float.fromhex, scanf).
I have not, yet, found a way to parse this exact hex encoded exponential format in C# or using the .NET libraries.
Is there a good way to handle this, or a decent alternative encoding? (decimal encoding is not exact).
Example strings:
0x1p-8
-0xfe8p-12
Thank you

Unfortunately, I don't know of any method built-in to .NET that compares to Python's float.fromhex(). So I suppose the only thing you can do is roll your own .fromhex() in C#. This task can range in difficulty from "Somewhat Easy" to "Very Difficult" depending on how complete and how optimized you'd like your solution to be.
Officially, the IEEE 754 spec allows for decimals within the hexadecimal coefficient (ie. 0xf.e8p-12) which adds a layer of complexity for us since (much to my frustration) .NET also does not support Double.Parse() for hexadecimal strings.
If you can constrain the problem to examples like you've provided where you only have integers as the coefficient, you can use the following solution using string operations:
public static double Parsed(string hexVal)
{
int index = 0;
int sign = 1;
double exponent = 0d;
//Check sign
if (hexVal[index] == '-')
{
sign = -1;
index++;
}
else if (hexVal[index] == '+')
index++;
//consume 0x
if (hexVal[index] == '0')
{
if (hexVal[index+1] == 'x' || hexVal[index+1] == 'X')
index += 2;
}
int coeff_start = index;
int coeff_end = hexVal.Length - coeff_start;
//Check for exponent
int p_index = hexVal.IndexOfAny(new char[] { 'p', 'P' });
if (p_index == 0)
throw new FormatException("No Coefficient");
else if (p_index > -1)
{
coeff_end = p_index - index;
int exp_start = p_index + 1;
int exp_end = hexVal.Length;
exponent = Convert.ToDouble(hexVal.Substring(exp_start, exp_end - (exp_start)));
}
var coeff = (double)(Int32.Parse(hexVal.Substring(coeff_start, coeff_end), NumberStyles.AllowHexSpecifier));
var result = sign * (coeff * Math.Pow(2, exponent));
return result;
}
If you're seeking an identical function to Python's fromhex(), you can try your hand at converting the CPython implementation into C# if you'd like. I tried, but got in over my head as I'm not very familiar with the standard and had trouble following all the overflow checks they were looking out for. They also allow other things like unlimited leading and trailing whitespace, which my solution does not allow for.
My solution is the "Somewhat Easy" solution. I'm guessing if you really knew your stuff, you could build the sign, exponent and mantissa at the bit level instead of multiplying everything out. You could definitely do it in one pass as well, rather than cheating with the .Substring() methods.
Hopefully this at least gets you on the right track.

I have written C# code for formatting and parsing numbers in the hexadecimal floating-point format described in IEEE 754r and supported by C99, C++11 and Java. The code is part of the BSD-licenced FParsec library for F# and is contained in a single file:
https://bitbucket.org/fparsec/main/src/tip/FParsecCS/HexFloat.cs
The supported format is described a bit at http://www.quanttec.com/fparsec/reference/charparsers.html#members.floatToHexString
The test code (written in F#) can be found at https://bitbucket.org/fparsec/main/src/tip/Test/HexFloatTests.fs

Related

How/why is a negative value parsing successfully to a char?

I've been stumped on this for a few days now. My goal is to ultimately take the following C++ snippet and port it to C#. This may not be possible, I don't normally work with C++ so I'm hoping for some guidance. Here is the basic method:
char* EncryptStr(char* pszPlainRec, char* pszEncrKey, int iStartPos, int iLength)
{
int i;
for (i = iStartPos; i < iStartPos + iLength; i++)
{
pszPlainRec[i] = (~pszPlainRec[i]) ^ pszEncrKey[i - iStartPos];
}
return pszPlainRec;
}
Prior to the complement operation:
pszPlainRec[8] = '0' = 48
pszEncrKey[0] = 'T' = 84
I get that the compliment operator ~ turns the value to negative but how is it generating the right angle bracket '>' with a negative value? See screenshot below for clarification. If this is some weird voodoo C++ magic, is there an equivalent way in C# or VB.net? Ideally I'd like to return a string. But I can't parse a negative number to a char in C#.

Check condition for hexadecimal number

I have a hexadecimal number written in the text file. I need to check the condition for hexadecimal number in if-else. For example, the Start Number written in the text file is 1100 and End Number is 10FF. The start number,1100 is the addition of End number with 1. This increment process done by other system.
In my case, the system will proceed to the next process after read the numbers in the text file.
This is my code:
var data = File
.ReadAllLines(Path)
.Select(x => x.Split('='))
.Where(x => x.Length > 1)
.ToDictionary(x => x[0].Trim(), x => x[1]);
Console.WriteLine("Start: {0}", data["Start"]);
Console.WriteLine("End: {0}", data["End"]);
if (data["Start"] == data["End"]+1)
{
//it will proceed to next process
}
else
{
//prompt not meet end number
}
The problem is, the if (data["Start"] == data["End"]+1) does not functioning. How can I resolve this issue? Do I need to convert the hexadecimal number to int first?
In C# if you concatenate string with number, number will get converted to a string and appended to the end of original string:
If you want to perform some math on your numbers, you need to convert them to correct data type first (in your case - integer).
To do this, you can use one of these commands:
if (Convert.ToInt32(data["Start"], 16) == Convert.ToInt32(data["End"], 16) + 1)
or
if (int.Parse(data["Start"], NumberStyles.HexNumber) == int.Parse(data["End"], NumberStyles.HexNumber) + 1)
They will convert your string that contains hexadecimal number to decimal representation of this number, and then it will behave as a number (addition will work as expected, for example).
"10FF" is not 0x10FF.
In c#, The fact that a string happens to contain text that can be parsed as a hexa-decimal number (or any number for that matter) doesn't mean it will be implicitly converted to that number.
In fact, it's the other way around - c# will implicitly convert the number to a string when using the + operator between a string and a number - so "10FF" + 1 will result with "10FF1".
Note I've started this line with "In c#" - because other languages might not follow the same rules. In T-SQL, for instance, implicit conversions from varchar to int happens all the time and it's a very common "gotcha" for inexperienced devs.
So you need to convert your strings to ints, as Lasse V. Karlsen wrote in the comments.
You can either do that by using Convert.ToInt32(string, 16) or by using int.Parse(str, NumberStyles.HexNumber) - if you're sure that the text will always contain the string representation of a hexa-decimal number.
For text that you're not sure about, you better use int.TryParse to avoid the risk of a FormatException - but note that TryParse returns bool and the int value is returned via an out parameter: int.TryParse(str, NumberStyles.HexNumber, CultureInfo.InvariantCulture, out var val)

C# Protobuf-net: Dictionary of decimals: Zeroes don't get roundtrip properly

I've found a weird bug around serialization/deserialization of decimal zeroes in protobuf-net, wondering if anyone has found a good workaround for this, or if this is actually a feature.
Given a dictionary like above, if i run in linqpad:
void Main()
{
{
Dictionary<string, decimal> dict = new Dictionary<string, decimal>();
dict.Add("one", 0.0000000m);
DumpStreamed(dict);
}
{
Dictionary<string, decimal> dict = new Dictionary<string, decimal>();
dict.Add("one", 0m);
DumpStreamed(dict);
}
}
public static void DumpStreamed<T>(T val)
{
using (var stream = new MemoryStream())
{
Console.Write("Stream1: ");
ProtoBuf.Serializer.Serialize(stream, val);
foreach (var by in stream.ToArray())
{
Console.Write(by);
}
Console.WriteLine();
Console.Write("Stream2: ");
stream.Position = 0;
var item = ProtoBuf.Serializer.Deserialize<T>(stream);
using(var stream2 = new MemoryStream())
{
ProtoBuf.Serializer.Serialize(stream2, item);
foreach (var by in stream2.ToArray())
{
Console.Write(by);
}
}
}
Console.WriteLine();
Console.WriteLine("----");
}
I'll get two different streams:
First serialization: 1091031111101011822414
Second serialization: 107103111110101180
(The 0.0000000m gets converted to 0 on deserialization).
I've found this is due to this line of code in ReadDecimal:
if (low == 0 && high == 0) return decimal.Zero;
Does anyone know why zeroes are getting normalized only during deserialization, and not on serialization?
Or any workaround for either consistently normalizing or consistently not normalizing decimal zero in a dictionary on serialization/deserialization?
Yep; the problem is this well-meaning but potentially harmful line:
if (low == 0 && high == 0) return decimal.Zero;
which neglects to check signScale. It should really be:
if (low == 0 && high == 0 && signScale == 0) return decimal.Zero;
I'll tweak that for the next build.
(edit: I ended up removing that check completely - the rest of the code is just some integer shifts etc, so having the "branch" is probably more expensive than not having it)
Floating point data types are actually structures with several elements. Among them are base value and an exponent to which the base value is to be raised. The c# documentation for decimal states the following:
The binary representation of a Decimal number consists of a 1-bit sign, a 96-bit integer number, and a scaling factor used to divide the integer number and specify what portion of it is a decimal fraction. The scaling factor is implicitly the number 10, raised to an exponent ranging from 0 to 28
So for example you could represent 1234000 as
A base value of 1234000 x 10 ^ 0
A base value of 123000 x 10 ^1
A base value of 12300 x 10 ^ 2
etc.
So this problem isn't just limited to zero. All decimal values could be represented more than one way. If you are relying on the byte streams to check for equivalence, you're in for a lot of problems. You really shouldn't be doing that, as you will definitely get false positives, not just for zero either.
As for normalization while serializing, I think that is a problem specific to ProtoBuf. You could certainly write your own serialization that takes steps to normalize the data, although it might be tricky to figure out. Another option is to convert the decimals to some custom class before storage, or store them as their string representations, as odd as that may sound.
If you are interested in monkeying around with some decimals and inspecting the raw data, see the GetBits() method. Or you could use this extension method to view the in-memory representation and see for yourself:
public static unsafe string ToBinaryHex(this decimal This)
{
byte* pb = (byte*)&This;
var bytes = Enumerable.Range(0, 16).Select(i => (*(pb + i)).ToString("X2"));
return string.Join("-", bytes);
}

Determine the decimal precision of an input number

We have an interesting problem were we need to determine the decimal precision of a users input (textbox). Essentially we need to know the number of decimal places entered and then return a precision number, this is best illustrated with examples:
4500 entered will yield a result 1
4500.1 entered will yield a result 0.1
4500.00 entered will yield a result 0.01
4500.450 entered will yield a result 0.001
We are thinking to work with the string, finding the decimal separator and then calculating the result. Just wondering if there is an easier solution to this.
I think you should just do what you suggested - use the position of the decimal point. Obvious drawback might be that you have to think about internationalization yourself.
var decimalSeparator = NumberFormatInfo.CurrentInfo.CurrencyDecimalSeparator;
var position = input.IndexOf(decimalSeparator);
var precision = (position == -1) ? 0 : input.Length - position - 1;
// This may be quite unprecise.
var result = Math.Pow(0.1, precision);
There is another thing you could try - the Decimal type stores an internal precision value. Therefore you could use Decimal.TryParse() and inspect the returned value. Maybe the parsing algorithm maintains the precision of the input.
Finally I would suggest not to try something using floating point numbers. Just parsing the input will remove any information about trailing zeros. So you have to add an artifical non-zero digit to preserve them or do similar tricks. You might run into precision issues. Finally finding the precision based on a floating point number is not simple, too. I see some ugly math or a loop multiplying with ten every iteration until there is no longer any fractional part. And the loop comes with new precision issues...
UPDATE
Parsing into a decimal works. Se Decimal.GetBits() for details.
var input = "123.4560";
var number = Decimal.Parse(input);
// Will be 4.
var precision = (Decimal.GetBits(number)[3] >> 16) & 0x000000FF;
From here using Math.Pow(0.1, precision) is straight forward.
UPDATE 2
Using decimal.GetBits() will allocate an int[] array. If you want to avoid the allocation you can use the following helper method which uses an explicit layout struct to get the scale directly out of the decimal value:
static int GetScale(decimal d)
{
return new DecimalScale(d).Scale;
}
[StructLayout(LayoutKind.Explicit)]
struct DecimalScale
{
public DecimalScale(decimal value)
{
this = default;
this.d = value;
}
[FieldOffset(0)]
decimal d;
[FieldOffset(0)]
int flags;
public int Scale => (flags >> 16) & 0xff;
}
Just wondering if there is an easier
solution to this.
No.
Use string:
string[] res = inputstring.Split('.');
int precision = res[1].Length;
Since your last examples indicate that trailing zeroes are significant, I would rule out any numerical solution and go for the string operations.
No, there is no easier solution, you have to examine the string. If you convert "4500" and "4500.00" to numbers, they both become the value 4500 so you can't tell how many non-value digits there were behind the decimal separator.
As an interesting aside, the Decimal tries to maintain the precision entered by the user. For example,
Console.WriteLine(5.0m);
Console.WriteLine(5.00m);
Console.WriteLine(Decimal.Parse("5.0"));
Console.WriteLine(Decimal.Parse("5.00"));
Has output of:
5.0
5.00
5.0
5.00
If your motivation in tracking the precision of the input is purely for input and output reasons, this may be sufficient to address your problem.
Working with the string is easy enough.
If there is no "." in the string, return 1.
Else return "0.", followed by n-1 "0", followed by one "1", where n is the length of the string after the decimal point.
Here's a possible solution using strings;
static double GetPrecision(string s)
{
string[] splitNumber = s.Split('.');
if (splitNumber.Length > 1)
{
return 1 / Math.Pow(10, splitNumber[1].Length);
}
else
{
return 1;
}
}
There is a question here; Calculate System.Decimal Precision and Scale which looks like it might be of interest if you wish to delve into this some more.

Quick and Simple Hash Code Combinations

Can people recommend quick and simple ways to combine the hash codes of two objects. I am not too worried about collisions since I have a Hash Table which will handle that efficiently I just want something that generates a code quickly as possible.
Reading around SO and the web there seem to be a few main candidates:
XORing
XORing with Prime Multiplication
Simple numeric operations like multiplication/division (with overflow checking or wrapping around)
Building a String and then using the String classes Hash Code method
What would people recommend and why?
I would personally avoid XOR - it means that any two equal values will result in 0 - so hash(1, 1) == hash(2, 2) == hash(3, 3) etc. Also hash(5, 0) == hash(0, 5) etc which may come up occasionally. I have deliberately used it for set hashing - if you want to hash a sequence of items and you don't care about the ordering, it's nice.
I usually use:
unchecked
{
int hash = 17;
hash = hash * 31 + firstField.GetHashCode();
hash = hash * 31 + secondField.GetHashCode();
return hash;
}
That's the form that Josh Bloch suggests in Effective Java. Last time I answered a similar question I managed to find an article where this was discussed in detail - IIRC, no-one really knows why it works well, but it does. It's also easy to remember, easy to implement, and easy to extend to any number of fields.
If you are using .NET Core 2.1 or later or .NET Framework 4.6.1 or later, consider using the System.HashCode struct to help with producing composite hash codes. It has two modes of operation: Add and Combine.
An example using Combine, which is usually simpler and works for up to eight items:
public override int GetHashCode()
{
return HashCode.Combine(object1, object2);
}
An example of using Add:
public override int GetHashCode()
{
var hash = new HashCode();
hash.Add(this.object1);
hash.Add(this.object2);
return hash.ToHashCode();
}
Pros:
Part of .NET itself, as of .NET Core 2.1/.NET Standard 2.1 (though, see con below)
For .NET Framework 4.6.1 and later, the Microsoft.Bcl.HashCode NuGet package can be used to backport this type.
Looks to have good performance and mixing characteristics, based on the work the author and the reviewers did before merging this into the corefx repo
Handles nulls automatically
Overloads that take IEqualityComparer instances
Cons:
Not available on .NET Framework before .NET 4.6.1. HashCode is part of .NET Standard 2.1. As of September 2019, the .NET team has no plans to support .NET Standard 2.1 on the .NET Framework, as .NET Core/.NET 5 is the future of .NET.
General purpose, so it won't handle super-specific cases as well as hand-crafted code
While the template outlined in Jon Skeet's answer works well in general as a hash function family, the choice of the constants is important and the seed of 17 and factor of 31 as noted in the answer do not work well at all for common use cases. In most use cases, the hashed values are much closer to zero than int.MaxValue, and the number of items being jointly hashed are a few dozen or less.
For hashing an integer tuple {x, y} where -1000 <= x <= 1000 and -1000 <= y <= 1000, it has an abysmal collision rate of almost 98.5%. For example, {1, 0} -> {0, 31}, {1, 1} -> {0, 32}, etc. If we expand the coverage to also include n-tuples where 3 <= n <= 25, it does less terrible with a collision rate of about 38%. But we can do much better.
public static int CustomHash(int seed, int factor, params int[] vals)
{
int hash = seed;
foreach (int i in vals)
{
hash = (hash * factor) + i;
}
return hash;
}
I wrote a Monte Carlo sampling search loop that tested the method above with various values for seed and factor over various random n-tuples of random integers i. Allowed ranges were 2 <= n <= 25 (where n was random but biased toward the lower end of the range) and -1000 <= i <= 1000. At least 12 million unique collision tests were performed for each seed and factor pair.
After about 7 hours running, the best pair found (where the seed and factor were both limited to 4 digits or less) was: seed = 1009, factor = 9176, with a collision rate of 0.1131%. In the 5- and 6-digit areas, even better options exist. But I selected the top 4-digit performer for brevity, and it peforms quite well in all common int and char hashing scenarios. It also seems to work fine with integers of much greater magnitudes.
It is worth noting that "being prime" did not seem to be a general prerequisite for good performance as a seed and/or factor although it likely helps. 1009 noted above is in fact prime, but 9176 is not. I explicitly tested variations on this where I changed factor to various primes near 9176 (while leaving seed = 1009) and they all performed worse than the above solution.
Lastly, I also compared against the generic ReSharper recommendation function family of hash = (hash * factor) ^ i; and the original CustomHash() as noted above seriously outperforms it. The ReSharper XOR style seems to have collision rates in the 20-30% range for common use case assumptions and should not be used in my opinion.
Use the combination logic in tuple. The example is using c#7 tuples.
(field1, field2).GetHashCode();
I presume that .NET Framework team did a decent job in testing their System.String.GetHashCode() implementation, so I would use it:
// System.String.GetHashCode(): http://referencesource.microsoft.com/#mscorlib/system/string.cs,0a17bbac4851d0d4
// System.Web.Util.StringUtil.GetStringHashCode(System.String): http://referencesource.microsoft.com/#System.Web/Util/StringUtil.cs,c97063570b4e791a
public static int CombineHashCodes(IEnumerable<int> hashCodes)
{
int hash1 = (5381 << 16) + 5381;
int hash2 = hash1;
int i = 0;
foreach (var hashCode in hashCodes)
{
if (i % 2 == 0)
hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ hashCode;
else
hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ hashCode;
++i;
}
return hash1 + (hash2 * 1566083941);
}
Another implementation is from System.Web.Util.HashCodeCombiner.CombineHashCodes(System.Int32, System.Int32) and System.Array.CombineHashCodes(System.Int32, System.Int32) methods. This one is simpler, but probably doesn't have such a good distribution as the method above:
// System.Web.Util.HashCodeCombiner.CombineHashCodes(System.Int32, System.Int32): http://referencesource.microsoft.com/#System.Web/Util/HashCodeCombiner.cs,21fb74ad8bb43f6b
// System.Array.CombineHashCodes(System.Int32, System.Int32): http://referencesource.microsoft.com/#mscorlib/system/array.cs,87d117c8cc772cca
public static int CombineHashCodes(IEnumerable<int> hashCodes)
{
int hash = 5381;
foreach (var hashCode in hashCodes)
hash = ((hash << 5) + hash) ^ hashCode;
return hash;
}
This is a repackaging of Special Sauce's brilliantly researched solution.
It makes use of Value Tuples (ITuple).
This allows defaults for the parameters seed and factor.
public static int CombineHashes(this ITuple tupled, int seed=1009, int factor=9176)
{
var hash = seed;
for (var i = 0; i < tupled.Length; i++)
{
unchecked
{
hash = hash * factor + tupled[i].GetHashCode();
}
}
return hash;
}
Usage:
var hash1 = ("Foo", "Bar", 42).CombineHashes();
var hash2 = ("Jon", "Skeet", "Constants").CombineHashes(seed=17, factor=31);
If your input hashes are the same size, evenly distributed and not related to each other then an XOR should be OK. Plus it's fast.
The situation I'm suggesting this for is where you want to do
H = hash(A) ^ hash(B); // A and B are different types, so there's no way A == B.
of course, if A and B can be expected to hash to the same value with a reasonable (non-negligible) probability, then you should not use XOR in this way.
If you're looking for speed and don't have too many collisions, then XOR is fastest. To prevent a clustering around zero, you could do something like this:
finalHash = hash1 ^ hash2;
return finalHash != 0 ? finalHash : hash1;
Of course, some prototyping ought to give you an idea of performance and clustering.
Assuming you have a relevant toString() function (where your different fields shall appear), I would just return its hashcode:
this.toString().hashCode();
This is not very fast, but it should avoid collisions quite well.
I would recommend using the built-in hash functions in System.Security.Cryptography rather than rolling your own.

Categories