Protocol Buffers c# (protobuf-net) Message::ByteSize

Protocol Buffers c# (protobuf-net) Message::ByteSize - c#

I am looking for the protobuf-net equivalent to the C++ API Message::ByteSize to find out the serialized message length in bytes.

I haven't played with the C++ API, so you'll have to give me a bit more context / information. What does this method do? Perhaps a sample usage?
If you are consuming data from a stream, there are "WithLengthPrefix" versions to automate limiting to discreet messages, or I believe the method to just read the next length from the stream is on the public API.
If you want to get a length in place of serializing, then currently I suspect the easiest option might be to serialize to a dummy stream and track the length. Oddly enough, an early version of protobuf-net did have "get the length without doing the work" methods, but after discussion on the protobuf-net I removed these. The data serialized is still tracked, obviously. However, because the API is different than the binary data length for objects is not available "for free".
If you clarify what the use-case is, I'm sure we can make it easily available (if it isn't already).
Re the comment; that is what I suspected. Because protobuf-net defers the binary translation to the last moment (because it is dealing with regular .NET types, not some self-generated code) there is no automatic way of getting this value without doing the work. I could add a mechanism to let you get this value by writing to Stream.Null? but if you need the data anyway you might benefit from just writing to MemoryStream and checking the .Length in advance of copying the data.

Related

Silverlight Binary Serialization over the wire

While nearly completing a new release, we've ignored the large size of the XML data that our WCF service returns to our silverlight client. Now we're investigating how to shrink the data, so that the results aren't in the 10-100mb range.
Its seems clear that binary serialization is the solution, and it seems easy enough to serialize the data into binary with, for instance, SharpSerializer, but through all of the SO posts about binary serialization and other tutorials I've come across, no one addresses how to send the serialized data across the wire to the Client. I expect I'm missing some obvious but critical piece to the WCF service puzzle.
Hopefully someone can lend me some help. Let me know if I should include more information.

First, try the built-in binary encoding (<binaryMessageEncoding> in config, see http://www.mostlydevelopers.com/blog/post/2009/10/14/Silverlight-3-WCF-Binary-Message-Encoding.aspx and http://www.silverlight.net/learn/data-networking/network-services-(soap,-rest-and-more)/how-do-i-use-binary-encoding-for-wcf-with-silverlight-3 ).
Your data will probably shrink, but please note that the built-in binary encoding was designed to be as fast as possible, not as small as possible.
If that's not enough and you want to use a 3rd=party component to do the serialization to binary data, you can indeed return this data as a byte[] (but you will also need to use <binaryMessageEncoding> above to prevent WCF from base64-encoding the data to make it valid XML). You can also use Stream instead of byte[], this won't give you true streaming behavior on the Silverlight client side but can give you true streaming on the server side.

How do I get my object's metadata information in Binary format in C#?

I am recieving binary stream from an application I am running in Python.
From the binary stream, I want to create a C# object that is inside the stream in byte array.
How do I deserialise the object and retrieve the object from the binary stream?
We can ignore that it's a python application. I am more interested in how binary streaming works.

You seem to think that all languages automatically use the same serialization scheme.
This is not so.
It is not even theoretically possible, because different programming languages have different notions of what it means to be an object.
If you are specifically interested in how to read a Python serialized stream in C#, then ask that. Otherwise, this question is unanswerable because it is based on a false premise.
FOLLOW UP - Out of curiosity, I did some searching for a Python pickle reader in C#. Nothing in the first 3 pages of search results ... though there was a reference to a pickle reader in C++.

Just to add you a little general info:
In C#/.Net there's a general approach to serialize objects to NOT a binary form, because a binary form needs a lot of protocol-like headers to - note - include the metadata, and this causes the receiver to have to know the .Net/CLR inner structure very well.
Instead, today, the objects are usually serialized to XML (when type information is crucial) or JSON formats (when only data matters), so that any receiver may read them quite easily, and more often - any 3rd party may easily generate new object-like data that our application may "just deserialize", regardless of who generated it and on what platform.
However, binary serialization is still used. XML/JSON data, even if compressed, is still usually larger than the binary image. However, the binary serialization is strictly used when we want the data to not be published to the outside world, or if we somehow magically know that it will be only processed on .Net with use of our assemblies.

C# object
C# does not have objects; it's a .Net object.
Secondly we absolutely CANNOT ignore that it's a Python application, because that implies that it's likely it's not running on .Net and therefore the .Net binary format is not native to your Python runtime. That's not to say that it's not possible for the .Net serialization to be available to you in this case, because if you're running IronPython - the .Net python implementation - then you can simply use the Binary serialization APIs from within that and get the .Net object that was serialized.
If, however, it's Python running on a different platform, then you can decode the information in the binary stream, for that you need to know the format, and for that go straight to the horse's mouth and read through the Binary Format Data Structure spec from MSDN.
This will, of course, require (quite a lot) more work!
If the project you're working on allows you to change the way that the original object is serialized, then I strongly suggest changing over to XML serialization or something similar - that is designed to be portable.

Comparison of serializing methods [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Fastest serializer and deserializer with lowest memory footprint in C#?
I'm using BinaryFormatter class to serialize an structure or a class. (after serialization, I'm going to encrypt the serialized file before saving. (And of course decrypt it before deserialization))
But I heard that some other serialization classes are present in .Net Framework. Like XmlSerializer, JavaScriptSerializer, DataContractSerializer and protobuf-net.
I want to know, which one is best for me?
Less RAM space needed for serialize/deserialize is the most important thing for me. Also speed is important.

If your aim is to reduce memory demands, then don't serialize then encrypt: instead - serialize directly to an encrypting Stream. The Stream API is designed to be chained (decorator pattern) to perform multiple transformations without excessive buffering. Likewise: deserialize from a decrypting stream; don't decrypt then deserialize. Done this way, data is encrypted/decrypted on-the-fly as needed; in addition to reducing memory, it is also good for security - since this also means the entire data never exists in decrypted form as a single buffer. See CryptoStream on MSDN for a full example.
Some additional notes; if you do happen to use protobuf-net, there are ways of reducing any in-memory buffering by using "grouped" encoding; you see: the default for sub-messages (including lists) is "length prefixed" - and the way it usually does this is by buffering the data in memory to calculate the length. However, protobuf also supports a format that uses a start/end marker which never requires knowing the length, so never requires buffering - and so the entire sequence can be written in a single pass direct to output (well, it does still use a buffer internally to improve IO, but it pools the buffer here, for maximum re-use). This is as simple as, for sub-objects:
[ProtoMember(11, DatFormat = DataFormat.Grouped)]
public Customer Customer {get;set;} // a sub-object
(where there is no significance in the 11)

See http://code.google.com/p/protobuf-net/wiki/Performance for a comparison of performance.

Is serialization a must in order to transfer data across the wire?

Below is something I read and was wondering if the statement is true.
Serialization is the process of
converting a data structure or object
into a sequence of bits so that it can
be stored in a file or memory buffer,
or transmitted across a network
connection link to be "resurrected"
later in the same or another computer
environment.[1] When the resulting
series of bits is reread according to
the serialization format, it can be
used to create a semantically
identical clone of the original
object. For many complex objects, such
as those that make extensive use of
references, this process is not
straightforward.

Serialization is just a fancy way of describing what you do when you want a certain data structure, class, etc to be transmitted.
For example, say I have a structure:
struct Color
{
int R, G, B;
};
When you transmit this over a network you don't say send Color. You create a line of bits and send it. I could create an unsigned char* and concatenate R, G, and B and then send these. I just did serialization

Serialization of some kind is required, but this can take many forms. It can be something like dotNET serialization, that is handled by the language, or it can be a custom built format. Maybe a series of bytes where each byte represents some "magic value" that only you and your application understand.
For example, in dotNET I can can create a class with a single string property, mark it as serializable and the dotNET framework takes care of most everything else.
I can also build my own custom format where the first 4 bytes represent the length of the data being sent and all subsequent bytes are characters in a string. But then of course you need to worry about byte ordering, unicode vs ansi encoding, etc etc.
Typically it is easier to make use of whatever framework your language/OS/dev framework uses, but it is not required.

Yes, serialization is the only way to transmit data over the wire. Consider what the purpose of serialization is. You define the way that the class is stored. In memory tho, you have no way to know exactly where each portion of the class is. Especially if you have, for instance, a list, if it's been allocated early but then reallocated, it's likely to be fragmented all over the place, so it's not one contiguous block of memory. How do you send that fragmented class over the line?
For that matter, if you send a List<ComplexType> over the wire, how does it know where each ComplexType begins and ends.

The real problem here is not getting over the wire, the problem is ending up with the same semantic object on the other side of the wire. For properly transporting data between dissimilar systems -- whether via TCP/IP, floppy, or punch card -- the data must be encoded (serialized) into a platform independent representation.
Because of alignment and type-size issues, if you attempted to do a straight binary transfer of your object it would cause Undefined Behavior (to borrow the definition from the C/C++ standards).
For example the size and alignment of the long datatype can differ between architectures, platforms, languages, and even different builds of the same compiler.

Is serialization a must in order to transfer data across the wire?
Literally no.
It is conceivable that you can move data from one address space to another without serializing it. For example, a hypothetical system using distributed virtual memory could move data / objects from one machine to another by sending pages ... without any specific serialization step.
And within a machine, the objects could be transferred by switch pages from one virtual address space to another.
But in practice, the answer is yes. I'm not aware of any mainstream technology that works that way.

For anything more complex than a primitive or a homogeneous run of primitives, yes.

Binary serialization is not the only option. You can also serialize an object as an XML file, for example. Or as a JSON.

I think you're asking the wrong question. Serialization is a concept in computer programming and there are certain requirements which must be satisfied for something to be considered a serialization mechanism.
Any means of preparing data such that it can be transmitted or stored in such a way that another program (including but not limited to another instance of the same program on another system or at another time) can read the data and re-instantiate whatever objects the data represents.
Note I slipped the term "objects" in there. If I write a program that stores a bunch of text in a file; and I later use some other program, or some instance of that first program to read that data ... I haven't really used a "serialization" mechanism. If I write it in such a way that the text is also stored with some state about how it was being manipulated ... that might entail serialization.
The term is used mostly to convey the concept that active combinations of behavior and state are being rendered into a form which can be read by another program/instance and instantiated. Most serialization mechanism are bound to a particular programming language, or virtual machine system (in the sense of a Java VM, a C# VM etc; not in the sense of "VMware" virtual machines). JSON (and YAML) are a notable exception to this. They represents data for which there are reasonably close object classes with reasonably similar semantics such that they can be instantiated in multiple different programming languages in a meaningful way.
It's not that all data transmission or storage entails "serialization" ... is that certain ways of storing and transmitting data can be used for serialization. At very list it must be possible to disambiguated among the types of data that the programming language supports. If it reads: 1 is has to know whether that's text or an integer or a real (equivalent to 1.0) or a bit.

Strictly speaking it isn't the only option; you could put an argument that "remoting" meets the meaning inthe text; here a fake object is created at the receiver that contains no state. All calls (methods, properties etc) are intercepted and only the call and result are transferred. This avoids the need to transfer the object itself, but can get very expensive if overly "chatty" usage is involved (I.e. Lots of calls)as each has the latency of the speed of light (which adds up).
However, "remoting" is now rather out of fashion. Most often, yes: the object will need to be serialised and deserialized in some way (there are lots of options here). The paragraph is then pretty-much correct.

Having a messages as objects and serializing into bytes is a better way of understanding and managing what is transmitted over wire. In the old days protocols and data was much simpler, often, programmers just put bytes into output stream. Common understanding was shared by having well-known and simple specifications.

I would say serialization is needed to store the objects in file for persistence, but dynamically allocated pointers in objects need to be build again when we de-serialize, But the serialization for transfer depends on the physical protocol and the mechanism used, for example if i use UART to transfer data then its serialized bit by bit but if i use parallel port then 8 bits together gets transferred , which is not serialized

Reading custom binary data formats in C# .NET

I'm trying to write a simple reader for AutoCAD's DWG files in .NET. I don't actually need to access all data in the file so the complexity that would otherwise be involved in writing a reader/writer for the whole file format is not an issue.
I've managed to read in the basics, such as the version, all the header data, the section locator records, but am having problems with reading the actual sections.
The problem seems to stem from the fact that the format uses a custom method of storing some data types. I'm going by the specs here:
http://www.opendesign.com/files/guestdownloads/OpenDesign_Specification_for_.dwg_files.pdf
Specifically, the types that depend on reading in of individual bits are the types I'm struggling to read. A large part of the problem seems to be that C#'s BinaryReader only lets you read in whole bytes at a time, when in fact I believe I need the ability to read in individual bits and not simply 8 bits or a multiple of at a time.
It could be that I'm misunderstanding the spec and how to interpret it, but if anyone could clarify how I might go about reading in individual bits from a stream, or even how to read in some of the variables types in the above spec that require more complex manipulation of bits than simply reading in full bytes then that'd be excellent.
I do realise there are commercial libraries out there for this, but the price is simply too high on all of them to be justifiable for the task at hand.
Any help much appreciated.

You can always use BitArray class to do bit wise manipulation. So you read bytes from file and load them into BitArray and then access individual bits.

For the price of any of those libraries you definitely cannot develop something stable yourself. How much time did you spend so far?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.