C# Protobuf-net: Dictionary of decimals: Zeroes don't get roundtrip properly - c#

I've found a weird bug around serialization/deserialization of decimal zeroes in protobuf-net, wondering if anyone has found a good workaround for this, or if this is actually a feature.
Given a dictionary like above, if i run in linqpad:
void Main()
{
{
Dictionary<string, decimal> dict = new Dictionary<string, decimal>();
dict.Add("one", 0.0000000m);
DumpStreamed(dict);
}
{
Dictionary<string, decimal> dict = new Dictionary<string, decimal>();
dict.Add("one", 0m);
DumpStreamed(dict);
}
}
public static void DumpStreamed<T>(T val)
{
using (var stream = new MemoryStream())
{
Console.Write("Stream1: ");
ProtoBuf.Serializer.Serialize(stream, val);
foreach (var by in stream.ToArray())
{
Console.Write(by);
}
Console.WriteLine();
Console.Write("Stream2: ");
stream.Position = 0;
var item = ProtoBuf.Serializer.Deserialize<T>(stream);
using(var stream2 = new MemoryStream())
{
ProtoBuf.Serializer.Serialize(stream2, item);
foreach (var by in stream2.ToArray())
{
Console.Write(by);
}
}
}
Console.WriteLine();
Console.WriteLine("----");
}
I'll get two different streams:
First serialization: 1091031111101011822414
Second serialization: 107103111110101180
(The 0.0000000m gets converted to 0 on deserialization).
I've found this is due to this line of code in ReadDecimal:
if (low == 0 && high == 0) return decimal.Zero;
Does anyone know why zeroes are getting normalized only during deserialization, and not on serialization?
Or any workaround for either consistently normalizing or consistently not normalizing decimal zero in a dictionary on serialization/deserialization?

Yep; the problem is this well-meaning but potentially harmful line:
if (low == 0 && high == 0) return decimal.Zero;
which neglects to check signScale. It should really be:
if (low == 0 && high == 0 && signScale == 0) return decimal.Zero;
I'll tweak that for the next build.
(edit: I ended up removing that check completely - the rest of the code is just some integer shifts etc, so having the "branch" is probably more expensive than not having it)

Floating point data types are actually structures with several elements. Among them are base value and an exponent to which the base value is to be raised. The c# documentation for decimal states the following:
The binary representation of a Decimal number consists of a 1-bit sign, a 96-bit integer number, and a scaling factor used to divide the integer number and specify what portion of it is a decimal fraction. The scaling factor is implicitly the number 10, raised to an exponent ranging from 0 to 28
So for example you could represent 1234000 as
A base value of 1234000 x 10 ^ 0
A base value of 123000 x 10 ^1
A base value of 12300 x 10 ^ 2
etc.
So this problem isn't just limited to zero. All decimal values could be represented more than one way. If you are relying on the byte streams to check for equivalence, you're in for a lot of problems. You really shouldn't be doing that, as you will definitely get false positives, not just for zero either.
As for normalization while serializing, I think that is a problem specific to ProtoBuf. You could certainly write your own serialization that takes steps to normalize the data, although it might be tricky to figure out. Another option is to convert the decimals to some custom class before storage, or store them as their string representations, as odd as that may sound.
If you are interested in monkeying around with some decimals and inspecting the raw data, see the GetBits() method. Or you could use this extension method to view the in-memory representation and see for yourself:
public static unsafe string ToBinaryHex(this decimal This)
{
byte* pb = (byte*)&This;
var bytes = Enumerable.Range(0, 16).Select(i => (*(pb + i)).ToString("X2"));
return string.Join("-", bytes);
}

Related

Decimal stores precision from parsed string in C#? What are the implications?

During a conversation on IRC, someone pointed out the following:
decimal.Parse("1.0000").ToString() // 1.0000
decimal.Parse("1.00").ToString() // 1.00
How/why does the decimal type retain precision (or, rather, significant figures) like this? I was under the impression that the two values are equal, not distinct.
This also raises further questions:
How is the number of significant figures decided during mathematical operations?
Does the number of significant figures get retained during serialization?
Does the current culture affect the way this is handled?
How is the number of significant figures decided during mathematical operations?
This is specified in the ECMA-334 C# 4 specification 11.1.7 p.112
A decimal is represented as an integer scaled by a power of ten. For
decimals with an absolute value less than 1.0m, the value is exact to
at least the 28th decimal place. For decimals with an absolute value
greater than or equal to 1.0m, the value is exact to at least 28
digits.
Does the number of significant figures get retained during serialization?
Yes it does, with seriallization the value and its precision does not change
[Serializable]
public class Foo
{
public decimal Value;
}
class Program
{
static void Main(string[] args)
{
decimal d1 = decimal.Parse("1.0000");
decimal d2 = decimal.Parse("1.00");
Debug.Assert(d1 ==d2);
var foo1 = new Foo() {Value = d1};
var foo2 = new Foo() {Value = d2};
IFormatter formatter = new BinaryFormatter();
Stream stream = new FileStream("data.bin", FileMode.Create, FileAccess.Write, FileShare.None);
formatter.Serialize(stream, d1);
stream.Close();
formatter = new BinaryFormatter();
stream = new FileStream("data.bin", FileMode.Open, FileAccess.Read, FileShare.Read);
decimal deserializedD1 = (decimal)formatter.Deserialize(stream);
stream.Close();
Debug.Assert(d1 == deserializedD1);
Console.WriteLine(d1); //1.0000
Console.WriteLine(d2); //1.00
Console.WriteLine(deserializedD1); //1.0000
Console.Read();
}
}
Does the current culture affect the way this is handled?
The current culture affects only how a decimal can be parsed from a string, for example it can handle '.' or ',' as a culture-specific decimal point symbol or the currency symbol, should you provide it, e.g. "£123.4500". Culture does not change the way an object is stored internally and it does not affect its precision.
Internally, decimal has a mantissa, an exponent and a sign, so no space for anything else.
A decimal consists of a 96-bit integer and a scaling factor (number of digits after the decimal point), which ranges from 0 to 28. Thus:
1.000 becomes 1000 with scaling factor 3.
In addition to post I see here, I would personally add a side note:
always during persistence manipulations with floating point/decimal/double numbers consider the culture you're in, or you're going to save in. The code like here written is first, but definitive pass to complete mess and non culture independent architecture.
Use Decimal.Parse (String, IFormatProvider).
In my opinion, that methods (Parse From/To) that lack of Culture parameter have to be removed from the library to force a developer to think about that very important aspect.

I need to take an int and convert it into its binary notation (C#)

I need to take an int and turn it into its byte form.
i.e. I need to take '1' and turn it into '00000001'
or '160' and turn it into '10100000'
Currently, I am using this
int x = 3;
string s = Convert.ToString(x, 2);
int b = int.Parse(s);
This is an awful way to do things, so I am looking for a better way.
Any Suggestions?
EDIT
Basically, I need to get a list of every number up to 256 in base-2. I'm going to store all the numbers in a list, and keep them in a table on my db.
UPDATE
I decided to keep the base-2 number as a string instead of parsing it back. Thanks for the help and sorry for the confusion!
If you need the byte form you should take a look at the BitConverter.GetBytes() method. It does not return a string, but an array of bytes.
The int is already a binary number. What exactly are you looking to do with the new integer? What you are doing is setting a base 10 number to a base 2 value. That's kind of confusing and I think you must be trying to do something that should happen a different way.
I don't know what you need at the end ... this may help:
Turn the int into an int array:
byte[] bytes = BitConverter.GetBytes(x);
Turn the int into a bit array:
BitArray bitArray = new BitArray(new[] {x});
You can use BitArray.
The code looks a bit clumsy, but that could be improved a bit.
int testValue = 160;
System.Collections.BitArray bitarray = new System.Collections.BitArray(new int[] { testValue });
var bitList = new List<bool>();
foreach (bool bit in bitarray)
bitList.Add(bit);
bitList.Reverse();
var base2 = 0;
foreach (bool bit in bitList)
{
base2 *= 10; // Shift one step left
if (bit)
base2++; // Add 1 last
}
Console.WriteLine(base2);
I think you are confusing the data type Integer with its textual representation.
int x = 3;
is the number three regardless of the representation (binary, decimal, hexadecimal, etc.)
When you parse the binary textual representation of an integer back to integer you are getting a different number. The framework assumes you are parsing the number represented in the decimal base and gives the corresponding integer.
You can try
int x = 1600;
string s = Convert.ToString(x, 2);
int b = int.Parse(s);
and it will throw an exception because the binary representation of 1600 interpreted as decimal
is too big to fit in an integer

Determine the decimal precision of an input number

We have an interesting problem were we need to determine the decimal precision of a users input (textbox). Essentially we need to know the number of decimal places entered and then return a precision number, this is best illustrated with examples:
4500 entered will yield a result 1
4500.1 entered will yield a result 0.1
4500.00 entered will yield a result 0.01
4500.450 entered will yield a result 0.001
We are thinking to work with the string, finding the decimal separator and then calculating the result. Just wondering if there is an easier solution to this.
I think you should just do what you suggested - use the position of the decimal point. Obvious drawback might be that you have to think about internationalization yourself.
var decimalSeparator = NumberFormatInfo.CurrentInfo.CurrencyDecimalSeparator;
var position = input.IndexOf(decimalSeparator);
var precision = (position == -1) ? 0 : input.Length - position - 1;
// This may be quite unprecise.
var result = Math.Pow(0.1, precision);
There is another thing you could try - the Decimal type stores an internal precision value. Therefore you could use Decimal.TryParse() and inspect the returned value. Maybe the parsing algorithm maintains the precision of the input.
Finally I would suggest not to try something using floating point numbers. Just parsing the input will remove any information about trailing zeros. So you have to add an artifical non-zero digit to preserve them or do similar tricks. You might run into precision issues. Finally finding the precision based on a floating point number is not simple, too. I see some ugly math or a loop multiplying with ten every iteration until there is no longer any fractional part. And the loop comes with new precision issues...
UPDATE
Parsing into a decimal works. Se Decimal.GetBits() for details.
var input = "123.4560";
var number = Decimal.Parse(input);
// Will be 4.
var precision = (Decimal.GetBits(number)[3] >> 16) & 0x000000FF;
From here using Math.Pow(0.1, precision) is straight forward.
UPDATE 2
Using decimal.GetBits() will allocate an int[] array. If you want to avoid the allocation you can use the following helper method which uses an explicit layout struct to get the scale directly out of the decimal value:
static int GetScale(decimal d)
{
return new DecimalScale(d).Scale;
}
[StructLayout(LayoutKind.Explicit)]
struct DecimalScale
{
public DecimalScale(decimal value)
{
this = default;
this.d = value;
}
[FieldOffset(0)]
decimal d;
[FieldOffset(0)]
int flags;
public int Scale => (flags >> 16) & 0xff;
}
Just wondering if there is an easier
solution to this.
No.
Use string:
string[] res = inputstring.Split('.');
int precision = res[1].Length;
Since your last examples indicate that trailing zeroes are significant, I would rule out any numerical solution and go for the string operations.
No, there is no easier solution, you have to examine the string. If you convert "4500" and "4500.00" to numbers, they both become the value 4500 so you can't tell how many non-value digits there were behind the decimal separator.
As an interesting aside, the Decimal tries to maintain the precision entered by the user. For example,
Console.WriteLine(5.0m);
Console.WriteLine(5.00m);
Console.WriteLine(Decimal.Parse("5.0"));
Console.WriteLine(Decimal.Parse("5.00"));
Has output of:
5.0
5.00
5.0
5.00
If your motivation in tracking the precision of the input is purely for input and output reasons, this may be sufficient to address your problem.
Working with the string is easy enough.
If there is no "." in the string, return 1.
Else return "0.", followed by n-1 "0", followed by one "1", where n is the length of the string after the decimal point.
Here's a possible solution using strings;
static double GetPrecision(string s)
{
string[] splitNumber = s.Split('.');
if (splitNumber.Length > 1)
{
return 1 / Math.Pow(10, splitNumber[1].Length);
}
else
{
return 1;
}
}
There is a question here; Calculate System.Decimal Precision and Scale which looks like it might be of interest if you wish to delve into this some more.

Best way to format single & double values as strings for SimpleDB?

Amazon's SimpleDB stores values as strings, and I need to store numeric values so that they still compare correctly, for example:
"0001" < "0002"
I think bytes, integers and decimals will be fairly straightforward, but I'm a little unsure on the best way to handle singles and doubles, since they can be very small or large and would appreciate any suggestions from those more clever than I!
(I'm using C#)
If you already have a way to represent sign-magnitude numbers (like the integers that you said wouldn't be too hard), then you're already there ;-]
From Comparing Floating Point Numbers
The IEEE float and double formats were
designed so that the numbers are
“lexicographically ordered”, which –
in the words of IEEE architect William
Kahan means “if two floating-point
numbers in the same format are ordered
( say x < y ), then they are ordered
the same way when their bits are
reinterpreted as Sign-Magnitude
integers.”
static public string DoubleToSortableString(double dbl)
{
Int64 interpretAsLong =
BitConverter.ToInt64(BitConverter.GetBytes(dbl), 0);
return LongToSortableString(interpretAsLong);
}
static public double DoubleFromSortableString(string str)
{
Int64 interpretAsLong =
LongFromSortableString(str);
return BitConverter.ToDouble(BitConverter.GetBytes(interpretAsLong), 0);
}
static public string LongToSortableString(long lng)
{
if (lng < 0)
return "-" + (~lng).ToString("X16");
else
return "0" + lng.ToString("X16");
}
static public long LongFromSortableString(string str)
{
if (str.StartsWith("-"))
return ~long.Parse(str.Substring(1, 16), NumberStyles.HexNumber);
else
return long.Parse(str.Substring(1, 16), NumberStyles.HexNumber);
}
-0010000000000000 => -1.79769313486232E+308
-3F0795FFFFFFFFFF => -100000
-3F3C77FFFFFFFFFF => -10000
-3F70BFFFFFFFFFFF => -1000
-3FA6FFFFFFFFFFFF => -100
-3FDBFFFFFFFFFFFF => -10
-400FFFFFFFFFFFFF => -1
00000000000000000 => 0
03FF0000000000000 => 1
04024000000000000 => 10
04059000000000000 => 100
0408F400000000000 => 1000
040C3880000000000 => 10000
040F86A0000000000 => 100000
07FEFFFFFFFFFFFFF => 1.79769313486232E+308
One option (if you don't require they be human-readable) would be to store the exponent first (zero-filled), then the mantissa. Something like "(07:4.5) for what would normally be written 4.5e7.
*smile* Are you going to be dealing with signed values or positive floats less than 1? If so, you'll need to do something like offsets as well, but on your brackets (e.g. [] for positive, () for negative) as well as the mantissa.
If you want to be able to sort integers in with your singles, etc. You should probably just normalize everything to the largest type (e.g. your doubles) on the way in rather than trying to get too tricky.
Thus:
7 --> [100,17.0]
0.1 --> [099,11.0]
-2 --> (100,08.0)
and so on.

How can I Convert a Big decimal number to Hex in C# (Eg : 588063595292424954445828)

The number is bigger than int & long but can be accomodated in Decimal. But the normal ToString or Convert methods don't work on Decimal.
I believe this will produce the right results where it returns anything, but may reject valid integers. I dare say that can be worked around with a bit of effort though... (Oh, and it will also fail for negative numbers at the moment.)
static string ConvertToHex(decimal d)
{
int[] bits = decimal.GetBits(d);
if (bits[3] != 0) // Sign and exponent
{
throw new ArgumentException();
}
return string.Format("{0:x8}{1:x8}{2:x8}",
(uint)bits[2], (uint)bits[1], (uint)bits[0]);
}
Do it manually!
http://www.permadi.com/tutorial/numDecToHex/
I've got to agree with James - do it manually - but don't use base-16. Use base 2^32, and print 8 hex digits at a time.
I guess one option would be to keep taking chunks off it, and converting individual chunks? A bit of mod/division etc, converting individual fragments...
So: what hex value do you expect?
Here's two approaches... one uses the binary structure of decimal; one does it manually. In reality, you might want to have a test: if bits[3] is zero, do it the quick way, otherwise do it manually.
decimal d = 588063595292424954445828M;
int[] bits = decimal.GetBits(d);
if (bits[3] != 0) throw new InvalidOperationException("Only +ve integers supported!");
string s = Convert.ToString(bits[2], 16).PadLeft(8,'0') // high
+ Convert.ToString(bits[1], 16).PadLeft(8, '0') // middle
+ Convert.ToString(bits[0], 16).PadLeft(8, '0'); // low
Console.WriteLine(s);
/* or Jon's much tidier: string.Format("{0:x8}{1:x8}{2:x8}",
(uint)bits[2], (uint)bits[1], (uint)bits[0]); */
const decimal chunk = (decimal)(1 << 16);
StringBuilder sb = new StringBuilder();
while (d > 0)
{
int fragment = (int) (d % chunk);
sb.Insert(0, Convert.ToString(fragment, 16).PadLeft(4, '0'));
d -= fragment;
d /= chunk;
}
Console.WriteLine(sb);

Categories