Protobuf-net deserialization exception with enum

Protobuf-net deserialization exception with enum - c#

I have an enum with flags that I decorate with the [ProtoMember] attribute that serializes and deserializes fine on my local box running Win7 x64.
However my use case involves serializing on a server running Windows Server 2008 R2 Enterprise 64-bit and deserializing on my local box. When I deserialize, I get the exception:"Overflow Exception was unhandled; Arithmetic operation resulted in an overflow". It seems to be thrown from ProtoBuf.Serializers.CompiledSerializer.ProtoBuf.Serializers.IProtoSerializer.Read(Object value, ProtoReader source).
I tried changing the enum to an int and serializing on server/deserializing locally works. I would like to use the enum instead of an int. What am I doing wrong?
Not sure if this is pertinent information but the executable I run on the server is built on my local box.
The enum is from a referenced external dll. When I duplicate the enum code in my solution, deserializing works. The exception is only thrown when I am using an enum from an external dll (where I suspect the source code isn't known) and the enum value is larger than (it seems) 128. In my case, Status.Zeta and Status.All threw the exception; other enum values deserialized properly. The enum is defined as such:
[Flags]
public enum Status
{
None = 0,
Alpha = 1,
Beta = 8,
Gamma = 16,
Delta = 32,
Epsilon = 64,
Zeta = 132,
All = 255,
}
I cannot change the code in the dll. How can I make this work? Do I need a .proto file? I am trying to avoid this if possible.

This only impacts enums that are : byte
D'oh! Spot the brain-dead error:
case ProtoTypeCode.SByte: Emit(OpCodes.Conv_Ovf_U1); break;
case ProtoTypeCode.Byte: Emit(OpCodes.Conv_Ovf_I1); break;
This will be reversed and deployed later today. Thanks.
To explain properly: byte is unsigned (0 to 255), sbyte is signed (-128 to 127); Conv_Ovf_U1 is basically IL for "convert to byte, checking for overflow" (like how the checked keyword in C# works), and Conv_Ovf_I1 is "convert to sbyte, checking for overflow". Hence any value over 127 was triggering the overflow flag, causing an exception. This is fixed in r614, now deployed.

That is indeed a bit strange. There could be differences in the CLR that affect ProtoBuf (for instance, the CLR ships with a number of different GCs). Comparing the Machine.config files from the two machines might expose some differences.
As for solving the problem, you could try marking the enum itself with ProtoContract and each enum member with ProtoMember. The latter allows you to set a Value property for ProtoBuf to use. You can also set the DataFormat to Fixed and see if that works better than the default.
You can find some examples here.

Related

Error when trying to set a DWORD value in the windows registry using C #

I'm writing code that sets a value in the Windows registry. when I set that value manually, it works as expected; when I want to set the value programmatically, however, it gives an error. The value I want to define is a DWORD value type that is "4294967295". When I define that in the script, though, it says that DWORD does not support this value. And, yet, I can assign that exact value via the program I am using to manually update the registry.
Here's my code:
RegistryKey key = Registry.CurrentUser.OpenSubKey(#"Software\Lost in Days Studio\NO TIME", true);
key.SetValue("Current", 4294967295, RegistryValueKind.DWord);
key.Close();

As you likely know, a DWORD is stored as a 32 bit binary number. What may not be immediately obvious, though, is that it's unsigned—and, thus, a UInt32 (uint). Otherwise, you wouldn't be able to store a value of 4294967295, since the maximum value for a signed integer (int) is 2,147,483,647.
There's a catch, however! As #Jimi noted in the comments, SetValue() will attempt to do a Convert.ToInt32(), which will cause an overflow with any value above Int32.MaxValue—thus the error you are receiving. One would expect it to use Convert.ToUInt32() but, as #Jimi also discovered, that is a known bug in the method, which Microsoft is unable to fix due to backward compatibility concerns.
Instead, the SetValue() method converts a signed Int32 (int) into an unsigned 32 bit binary number, with values of 0…2147483647 remaining as is, but values of -2147483647…-1 getting saved to the 2147483648…4294967295 range.
That binary conversion is a bit convoluted if you're thinking about this as a signed integer. But, fortunately, you can use C#’s built-in unchecked() keyword to permit the overflow and effectively treat your uint as a signed int within those ranges:
key.SetValue("Current", unchecked((int)4294967295), RegistryValueKind.DWord);
This is really handy because it allows for you to continue to work with a standard UInt32 (uint) range of 0…4294967295, exactly like you would via e.g. RegEdit, without having to think about how the binary conversion is being handled.

Is struct field layout consistent with endianness in C#?

When I first learned endianness, I was very confused at how it worked. I finally explained it to myself by the following metaphor:
On a big-endian machine, an int[4] would be arranged like this:
| int[4] |
|int1|int2|int3|int4|
While on little-endian machines, it would be laid out like
| int[4] |
|1tni|2tni|3tni|4tni|
That way the layout of the array would be consistent in memory, while the values themselves would be arranged differently.
Now to the real question: I am writing more optimized versions of BinaryReader and BinaryWriter in my .NET library. One of the problems I have run into is the implementation of Write(decimal). A decimal contains 4 int fields: flags, hi, lo, and mid, in that order. So basically on your typical little-endian machine it would look like this in memory:
| lamiced |
|sgalf|ih|ol|dim|
My question is, how would the CLR arrange the struct on big-endian machines? Would it arrange it so that the basic layout of the decimal would be conserved, like so
| decimal |
|flags|hi|lo|mid|
or would it completely reverse the binary arrangement of the decimal, like
| decimal |
|mid|lo|hi|flags|
?
Don't have a big-endian machine nearby, otherwise I'd test it out myself.
edit: TL;DR does the following code print -1 or 0 on big-endian machines?
struct Pair
{
public int a;
public int b;
}
unsafe static void Main()
{
var p = default(Pair);
p.a = -1;
Console.WriteLine(*(int*)&p);
}

It's not entirely clear what your actual question is.
Regarding the relationship between the layout of fields in a data structure and endianness, there is none. Endianness does not affect how fields in a data structure are laid out, only the order of bytes within a field.
I.e. in answer to this:
does the following code print -1 or 0 on big-endian machines?
… the output will be -1.
But you seem to be also or instead asking about the effect of endianness on the in-memory representation of the Decimal type. Which is a somewhat different question.
Regarding the endianness of the Decimal in-memory representation, I'm not aware of any requirement that .NET provide consistent implementations of the Decimal type. As commenter Hans Passant points out, there are multiple ways to view the current implementation; either as the CLR code you referenced, or as the more detailed declaration seen in e.g. wtypes.h or OleDb.h (another place a DECIMAL type appears, which has the same format as elsewhere). But in reality, as far as .NET is concerned, you are not promised anything about the in-memory layout of the type.
I would expect, for simplicity in implementation, the fields representing the 3 32-bit mantissa components may be affected by endianness, individually. (The sign and scale are represented as individual bytes, so endianness would not affect those). That is, while the order of the individual 32 bit fields would remain the same — high, low, mid — the bytes within each field will be represented according to the current platform's endianness.
But if Microsoft for some bizarre reason decided they wanted the .NET implementation to deviate from the native implementation (seems unlikely, but let's assume it for the sake of argument) and always use little-endian for the fields even on big-endian platforms, that would be within their rights.
For that matter, they could even rearrange the fields if they wanted to: their current order appears to me to be a concession to the de facto x86 standard of little-endianness, such that on little-endian architectures the combination of low and mid 32-bit values can be treated as a single 64-bit value without swapping words, so if they decided to deviate from the wtypes.h declaration, they might well decide to just make the mantissa a single 96-bit, little-endian or big-endian value.
Again, I'm not saying these actions are in any way likely. Just that they are theoretically possible and are just easy, obvious examples (a subset of all possible examples) of why writing managed code that assumes such private implementation details is probably not a good idea.
Even if you had access to a big-endian machine that could run .NET libraries (*) and so could test the actual behavior, today's current behavior doesn't offer you any guarantees of future behavior.
(*) (I don't even know of any…pure big-endian CPUs are fairly uncommon these days, and I can't think of a single one off the top of my head that is supported by Microsoft as an actual .NET platform.)
So…
I am skeptical that it is practical to author implementations of BinaryReader and BinaryWriter that are observably more optimized than those found in .NET already. The main reason for using these types is to handle I/O, and that necessarily means interacting with external systems that are orders of magnitude slower than the CPU that is handling the actual conversions to and from byte representations (and even the GC operations to support those conversions). Even if the existing Microsoft code were in some way hypothetically inefficient, in practice I doubt it would matter much.
But if you must implement these yourself, it seems to me that the only safe way to deal with the Decimal type is to use the Decimal.GetBits() method and Decimal.Decimal(int[]) constructor. These use clearly-documented, endian-independent mechanisms to convert the Decimal type. They are based on int, the in-memory representation of which will of course vary according to endianness, but your code will never need to worry about that, because it will only have to deal with entire int values, not their byte-wise representations.

How do I call a COM-function with a _variant_t parameter (type "long")?

I want to port a certain function call to C#. The two lines are as follows:
m_pBrowserApp->get_Document(&pVoid);
m_pLayoutAnalyzer->Analyze4(pVoid, _variant_t(5L));
m_pBrowserApp is the ActiveX browser object and pVoid is its document property. I can get that by calling WebBrowserBase.ActiveXInstance.Document. However, I have no idea how to create a _variant_t(5L) in C#. Since the call is not a VT_BYREF, it "should just work" by calling it like this:
ILayoutAnalyzer2 vips = new LayoutAnalyzer2();
vips.Initialize(0);
SHDocVw.WebBrowser_V1 axBrowser = (SHDocVw.WebBrowser_V1)this.webBrowser1.ActiveXInstance;
var doc = axBrowser.Document as mshtml.HTMLDocument;
vips.Analyze4(doc, (Object)5L); // fails with HRESULT: 0x80020005 (DISP_E_TYPEMISMATCH)
But it doesn't. It fails with a DISP_E_TYPEMISMATCH error.
I'm pretty sure the Document property is valid. So the question remains: How to I properly pass a long wrapped in a variant via interop?

Variants go back to the mid 1990s, a time when longs were consider long for having 32 bits. This is just a few years after the first 32-bit operating systems became available, an integer was still 16 bits in VB6 for example. Not so in C# and .NET in general, a 32-bit programming environment by design that never had to deal with 16-bit back-compat. So use a C# int, not a long.
Drop the L from the literal.

How to put a DWORD in the registry with the highest bit set

I've run into a strange problem: when setting values of the DWORD type in the Windows Registry from my C# application, I keep getting errors when the highest bit is set. Apparently there seems to be some kind of conversion problem between signed and unsigned integers.
Example: when I do something like this
regKey.SetValue("Value", 0x70000000u, RegistryValueKind.DWord);
it works fine. But when I add the highest bit (which, since I'm specifically dealing with unsigned integers, should be just another value bit), like this
regKey.SetValue("Value", 0xf0000000u, RegistryValueKind.DWord);
I get an exception ("The type of the value object did not match the specified RegistryValueKind or the object could not be properly converted").
But shouldn't it work? DWORD is an unsigned 32-bit integer data type, and so is the 0xf0000000u literal (C# automatically assigns it the UInt32 datatype), so they should be a perfect match (and setting the value manually in the registry editor to "0xf0000000" works fine, too). Is this a bug in .NET or am I doing something wrong?

My guess is that you need to use a signed int instead. So just convert it like this:
regKey.SetValue("Value", unchecked((int) 0xf0000000u),
RegistryValueKind.DWord);
I agree it's a bit odd, when you consider that DWORD is normally unsigned (IIRC) but it's at least worth a try...

I know this is crazy late, however, if you don't want to use unchecked or are using VB.Net where it's not available then the following would work as well.
Byte[] byteArray = BitConverter.GetBytes(0xf0000000u);
int USignedIntTooBigForInt = BitConverter.ToInt32(byteArray, 0);
regKey.SetValue("Value", USignedIntTooBigForInt, RegistryValueKind.DWord);

Serializing a List of objects using Protobuf-net

I've been looking to do some binary serialization to file and protobuf-net seems like a well-performing alternative. I'm a bit stuck in getting started though. Since I want to decouple the definition of the classes from the actual serialization I'm not using attributes but opting to go with .proto files, I've got the structure for the object down (I think)
message Post {
required uint64 id = 1;
required int32 userid = 2;
required string status= 3;
required datetime created = 4;
optional string source= 5;
}
(is datetime valid or should I use ticks as int64?)
but I'm stuck on how to use protogen and then serialize a IEnumerable of Post to a file and read it back. Any help would be appreciated
Another related question, is there any best practices for detecting corrupted binary files, like if the computer is shut down while serializing

Re DateTime... this isn't a standard proto type; I have added a BCL.DateTime (or similar) to my own library, which is intended to match the internal serialization that protobuf-net uses for DateTime, but I'm fairly certain I haven't (yet) updated the code-generator to detect this as a special-case. It would be fairly easy to add if you want me to try... If you want maximum portability, a "ticks" style approach might be pragmatic. Let me know...
Re serializing to a file - if should be about the same as the Getting Started example, but note that protobuf-net wants to work with data it can reconstruct; just IEnumerable<T> might cause problems - IList<T> should be fine, though (it'll default to List<T> as a concrete type when reconstructing).
Re corruption - perhaps use SerializeWithLengthPrefix - it can then detect issues even at a message boundary (where they are otherwise undetectable as an EOF). This (as the name suggests) writes the length first, so it knows whether is has enough data (via DeserializeWithLengthPrefix). Alternatively, reserve the first [n] bytes in your file for a hash / checksum. Write this blank spacer, then the data, calculate the hash / checksum and overwrite the start. Verify during deserialization. Much more work.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.