Proto2 vs. Proto3 in C#

Proto2 vs. Proto3 in C# - c#

I have to send messages to another team using the proto2 version of Google Protocol Buffers. They are using Java and C++ on Linux. I'm using C# on Windows.
Jon Skeet's protobuf-csharp-port (https://github.com/jskeet/protobuf-csharp-port) supports proto2. If I understand correctly, Google has taken this code and folded an updated version of it into the main protobuf project (https://github.com/google/protobuf/tree/master/csharp). But it no longer supports proto2 for C#, only proto3.
I'm not sure which project I should use. It seems like the new one will be better supported (performance, support for proto3 if the other team ever upgrades). But I would have to convert the .proto file that I was given from proto2 to proto3 and risk any issues that come with that.
I've read that for the most part, the messages for proto2 and proto3 are compatible. I have no experience with Protocol Buffers, but the .proto file I'm working with looks pretty vanilla, no default values or oneof or nested anything. So it seems like I could just delete their "required" and "optional" keywords and use the new library, treating this as a proto3 file.
In your opinion, is it worth the hassle to use the newer library? Is there a list of proto features that would make the proto2 and proto3 messages incompatible?

If the other team has any required fields and you send messages to them without specifying those fields (or even explicitly specifying the default value, for primitives) then the other end will fail to receive the messages - they won't validate.
There are various differences between proto2 and proto3 - some are listed on the releases page:
The following are the main new features in language version 3:
Removal of field presence logic for primitive value fields, removal of required fields, and removal of default values. This makes proto3 significantly easier to implement with open struct representations, as in languages like Android Java, Objective C, or Go.
Removal of unknown fields.
Removal of extensions, which are instead replaced by a new standard type called Any.
Fix semantics for unknown enum values.
Addition of maps.
Addition of a small set of standard types for representation of time, dynamic data, etc.
A well-defined encoding in JSON as an alternative to binary proto encoding.
The removal of unknown fields could be a significant issue to you - if the other team expects to be able to send you a message with some fields your code is unaware of, and you be able to return a message to them maintaining those fields, proto3 could pose problems for you.
If you can use proto3, I'd suggest using proto3 version, partly as it will have proper support whereas the proto2 version is basically in maintenance mode. There are significant differences between the two, primarily in terms of mutability - the generated message classes in the proto3 codebase are mutable, which is great for immediate usability, but can pose challenges in other areas.

Related

How to persuade GetProto to spit out proto3 format

Using the excellent ProtobufNet by Marc Gravell, we are able to maintain our types in C# and then export them to .proto files for conversion into all the languages needed by our clients.
However we would like to use the proto3 protocol format which is much simpler and less error prone than proto2 which seems to be standard.
After looking around the net we found this encouraging post from the author that seems to indicate that there is proto3 support: https://github.com/mgravell/protobuf-net/issues/187
However we have not found any documentation for ProtobufNet, and so it is a bit difficult to know how to pull this off. So the question is, how can we have GetProto generate proto3 compatible output for our decorated C# types?

In the current versions there is an optional parameter (technically an overload) that defines the schema version. I think it might even default to proto3.
So... just update? Or worst case: update and specify the optional parameter to GetProto.

How does .NET's Primary Interop Assembly Embedding work?

I am researching the .NET Common Language Infrastructure, and before I get into the nitty-gritty of the compiler I'm going to write, I want to be sure that certain features are available. In order to do that, I must understand how they work.
One feature I'm unsure of is the .NET Primary Interop Assembly embedding. I'm not quite sure how .NET goes about embedding only the types you use versus the types that are exposed by the types you use. From the bit of research I've done into this, I've noticed that it emits a bare-bones interface that utilizes vtable gap methods, where the method name format is VtblGap{0}_{1} where {0} is the index of the gap and {1} is the member size of the gap. These methods are marked rtspecialname and specialname. Whether this is accurate or not, is the question.
Assuming the above is true, how would I go about obtaining the necessary information to embed similar metadata into the resulted application?
From what I can tell, you can order the MemberInfo objects obtained via their metadata tokens for the order, and the dispid information is obtained via the attributes from the interop assembly. The area I'm most confused on are the interfaces that are imported that seem to have no direct correlation with the other embedded types, sequentially indexed interfaces that seem to be there for versioning reasons. Is their inclusion based off of their indexing or is there some other logic used? An example is Microsoft.Office.Interop.Word, when you add a document to an Application and, in doing something with it, it imports the document, its events, and so on.
Here's hoping someone in-the-know can clue me in on what else might be involved in embedding these types.

Which language idioms/paradigms/features make it hard to add support for "type providers"?

F# 3.0 has added type providers.
I wonder if it is possible to add this language feature to other languages running on the CLR like C# or if this feature only works well in a more functional/less OO programming style?

As Tomas says, it is theoretically straightforward to add this kind of feature to any statically-typed language (though still a lot of grunt-work).
I am not a meta-programming expert, but #SK-logic asks why not a general compile-time meta-programming system instead, and I shall try to answer. I don't think you can easily achieve what you can do with F# type providers using meta-programming, because F# type providers can be lazy and dynamically interactive at design-time. Let's give an example that Don has demo-ed in one of his earlier videos: a Freebase type provider. Freebase is kind of like a schematized, programmable wikipedia, it has data on everything. So you can end up writing code along the lines of
for e in Freebase.Science.``Chemical Elements`` do
printfn "%d: %s - %s" e.``Atomic number`` e.Name e.Discoverer.Name
or whatnot (I don't have the exact code offhand), but just as easily write code that gets information about baseball statistics, or when famous actors have been in drug rehab facilities, or a zillion other types of information available through Freebase.
From an implementation point-of-view, it is infeasible to generate a schema for all of Freebase and bring it into .NET a-priori; you can't just do one compile-time step at the beginning to set all this up. You can do this for small data sources, and in fact many other type providers use this strategy, e.g. a SQL type provider gets pointed at a database, and generates .NET types for all the types in that database. But this strategy does not work for large cloud data stores like Freebase, because there are too many interrelated types (if you tried to generate .NET metadata for all of Freebase, you'd find that there are so many millions of types (one of which is ChemicalElement with AtomicNumber and Discoverer and Name and many other fields, but there are literally millions of such types) that you need more memory than is available to a 32-bit .NET process just to represent the entire type schema.
So the F# type-provider strategy is an API architecture that allows type providers to supply information on-demand, running at design-time within the IDE. Until you type e.g. Freebase.Science., the type provider does not need to know about the entities under the science categories, but once you do press . after Science, then the type provider can go and query the APIs to learn one-more-level of the overall schema, to know what categories exist under Science, one of which is ChemicalElements. And then as you try to "dot into" one of those, it will discover that elements have atomic numbers and what-not. So the type provider lazily fetches just enough of the overall schema to deal with the exact code the user happens to be typing into the editor at that moment in time. As a result, the user still has the freedom to explore any part of the universe of information, but any one source code file or interactive session will only explore a tiny fraction of what is available. When it comes time to compile/codegen, the compiler need only generate enough code to accomodate exactly the bits that the user has actually used in his code, rather than the potentially huge runtime bits to afford the possibility of talking to the whole data store.
(Maybe you can do that with some of today's meta-programming facilities now, I don't know, but the ones I learned about in school a long while back could not have easily handled this.)

As Brian and Tomas point out, there's nothing particularly "functional" about this feature. It's just a particularly slick way to provide metadata to the compiler.
The C# design team has been kicking around ideas like this for a long time. There was a proposal a few years before I joined the C# team for a feature that was going to be called "type blueprints" (or something like that) whereby a combination of XML documents, XML schema and custom code that proffered up type metadata could be used by the C# compiler. I don't recall the details and it never came to fruition, obviously. (Though it did influence the design and the implementation of the Visual Studio Tools for Office document format, which I was working on at the time.)
In any event, we have no plans on the immediate horizon for adding such a feature to C#, but we are watching with great interest to see if it does a good job of solving customer problems in F#.
(As always, Eric's musings about possible future features of unnannounced and entirely hypothetical products are for entertainment purposes only.)

I don't see any technical reason why something like type providers couldn't be added to C# or similar languages. The only family of langauges that make it difficult to add type providers (in a similar way as in F#) are dynamically typed languages.
F# type providers rely on the fact that the type information that are generated by the provider nicely propagate through the program and the editor can use them to show useful IntelliSense. In dynamically typed languages, this would require more elaborate IDE support (and "type providers" for dynamic langauges reduce to just IDE or IntelliSense).
Why are they implemented directly as a feature of F#? I think the meta-programming system would have to be really complex (note that the types are not actually generated) to support this. The other things that could be done using it wouldn't contribute to the F# language that much (they would only make it too complex, which is a bad thing). However, you could get similar thing if you had some sort of compiler extensibility.
In fact, I think this is how the C# team will add something like type providers in the future (they talked about compiler extensibility for some time now).

Why there is a convention of declaring default namespace/libraries in any programming language?

Why don't any programming language load the default libraries like stdio.h, iostream.h or using Systemso that there declaration is avoided?
As these namespace/libraries are required in any program, why the compilers expect it to be declared by the user.
Do any programs exist without using namespace/headers? even if yes, whats wrong in loading a harmless default libraries?
I don't mean that .. I am lazy to write a line of code but it makes less sense (as per me) for a compiler to cry for declaration of so called default thingummies ending up in a compilation error.

It's because there are programs which are written without the standard libraries. For example, there are plenty of C programs running on embedded systems that don't provide stdio.h, since it doesn't make any sense on those platforms (in C, such environments are referred to as "freestanding", as opposed to the more usual "hosted").

The “default” libraries are not “required in any program”, and indeed there are many cases where they are not even available (operating system kernel/drivers, microcontrollers, etc). And more in the mainstream, many high-level graphical programs use system-specific GUI/graphics libraries instead of standard I/O.

For stdio.h/iostream(.h): the quick answer is that in the biggest part of your software, they are not needed (definitively not both). Headless devices/servers should having a logging module instead and GUI's don't always have a console to interface with.

Many languages (especially scripting languages, and languages that carry a standard runtime as part of the language spec) do do this.
The trade-off is convenience versus software-engineering goodness. The problem with opening namespaces by default is you end up with a lot of names being available immediately at the top level, which can cause name clashes and confusion, pollute Intellisense/autocompletion lists, etc.

To follow up on caf's answer.
You need to tell the compiler about these headers/libraries so you do not have to include anything that you do not want. Because they are not needed in every program. Any programmer is able to write a library in c or c++ that does not depend on any runtime libraries. This ability makes it possible to write as lean software as possible, and save memory/diskspace/compile time/link time (pick what you need most). In low level languages you should only pay for what you need, nothing more.

There is name collision problem exist. As language standards developed, they provide more and more features, giving them more amd more names. And the probability that system name and name defined in user program raises. To avoid this, new features defined in modules which will not included if program not using it. As system libraries use many common usage words as its symbols (like open, restricted) problem is serious.
But explicit module inclusion is not only method for avoiding collisions. They also are: using "standard" namespace for system names (e.g. C++, namespace std), reserving names and name patterns (e.g. C, double underscores), allowed redefinition (e.g. Forth).

Because libraries are external component of the language. If a day all library (or part of it) change headers, namespaces the language element don't change with them. The compiler checks the syntax and rules of programming language only.

protobuf-net communicating with C++

I'm looking at protobuf-net for implementing various messaging formats, and I particularly like the contract-based approach as I don't have to mess with the proto compiler. one thing I couldn't quite find information on is, does this make it difficult to work cross-platform? there are a few C++ apps that would need to be able to parse PB data, and while I understand that protobuf-net serializes to the PB standard format, if I use the contract approach and not a proto file, how does the C++ side parse the data?
can (should?) I write a separate proto file for the (very few) cases where C++ needs to understand the data? and if so, how exactly do I know that the C++ class generated from the proto file is going to match the data from the no-proto-file C# side?

Yes, in theory at least they should match at the binary level, but you might want to limit yourself to types that map simply to ".proto" - so avoid things like DateTime, inheritance ([ProtoInclude]), etc. This also has the advantage that you should be able to use:
string proto = Serializer.GetProto<YourType>();
to get the .proto; it (GetProto) isn't 100%, but it works for basic types. But ultimately, the answer is "testing and tweaking"; perhaps design for interop from the outset - i.e. test this early.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.