Generic serializer with ProtoBuf.net - c#

I'm attempting to write a generic serializer using protobuf.net v2. However I'm running into some issues which make me wonder if perhaps what I'm doing is impossible. The objects to be serialized are of an indeterminate type to which I don't have access so I'm attempting to walk the object and add its properties to the type model.
var model = TypeModel.Create();
List<string> propertiesToSerialize = new List<string>();
foreach (var property in typeToSerialize.GetProperties())
{
propertiesToSerialize.Add(property.Name);
}
model.AutoAddMissingTypes = true;
model.Add(typeToSerialize, true).Add(propertiesToSerialize.ToArray());
For simple objects which contain only primitives this seems to work just fine. However when working with an object which contains, say, a Dictionary<string,object> I encounter an error telling me that no serializer is registered for Object.
I did look at serializing a Dictionary<string,object> in ProtoBuf-net fails but it seems the suggested solution requires some knowledge and access to the object being serialized.
Any suggestions on how I might proceed?

protobuf-net does not set out to be able to serialize every scenario (especially those dominated by object), in exactly the same way that XmlSerializer and DataContractSerializer have scenarios which they can't model. In particular, the total lack of metadata in the protobuf format (part of why it is very efficient) means that it is only intended to be consumed by code that knows the structure of the data in advance - which is not possible if too much is object.
That said, there is some support via DynamicType=true, but that would not currently be enabled for the dictionary scenario you mention.
In most cases, though, it isn't really the case that the data can be anything; more typically there are a finite number of expected data types. When that is the case, the object problem can be addressed in a cleaner way using a slightly different model (specifically, a non-generic base-type, a generic sub-type, and a few "include" options). As with most serialization, there are scenarios were it may be desirable to have a separate "DTO" model, that looks closer to the serialization output than to your domain model.
A final note: the GetProperties()/Add() approach is not robust, as GetProperties() does not guarantee any particular order to the members; with protobuf-net in the way you show, order is important, as this helps determine the keys to use. Even if the order was fixed (sorting alphabetically, for example), note that adding a member could be a breaking change.

Related

Object Serialization to Dictionary per propertyname

I asking this mostly to see if there is something that ressembles what I am trying to achive.
My ultimate goal is to serialize an object graph, keeping references, but making all the properties accessible via a Path of sorts.
My first thoughts were something like dictionary and recursively loop through all the objects properties/fields, and if the value is different than the Default one, try and serialize if its a primitive, otherwise keep going till the type can be serialized.
I need to support Types not marked as [Serializable], and to be able to add custom serialization, similiar to how surrugates work in the BinaryFormatter.
Is there a serializer that works remotely like I said?
Or what other caveats can you think of, by this aproach?
Note: The serialization Size is not a big issue nor is the speed that it serializes/deserializes, at least for now.
An example of what I want to to is something like this :
SerializationContext.GetValue["root/Prop/X"] returns me the X value.
This needs to work with .Net 3.5, and it should serialize based on field/property name.

Slightly non-trivial data structure: is XmlSerializer right for me?

I'm currently using XmlSerializer to, surprisingly enough :), handle de/serialization of my data structures - I find it wonderfully simple to use, but at the cost of flexibility. At the moment, I'm using it for a tree-based structure; since XmlSerializer doesn't handle cyclic structures, I've added [XmlIgnore] to my Parent property, and do a post-deserialization iteration over the tree to fix up node parents.
Is there a better way to handle this using XmlSerializer, or would it be better to rewrite the code using XmlReader/XmlWriter? I suppose implementing IXmlSerializable would work, but it seems like a fair amount of work, while still retaining the cons of XmlSerializer.
The current post-deserialization step is OK, but I'm adding a data structure that has to be serialized to a separate XML file: basically a flat list of items that need a Parent property referencing a node from the previous tree structure. This would require yet a post-deserialization step, as well as keeping both a Parent attribute as well as as ParentId (or some trickery) in the new data structure.
So, any smart (and non-fragile) ideas? Or XmlReader/XmlWriter it is?
Solution
DataContractSerializer turned out to be a pretty decent solution, with pretty much the same simplicity as XmlSerializer. I opted not to use the automatic cycle handling but instead defining and OnDeserialized decorated method to handle setting the parent node; that way, the generated XML is standard-conforming.
One thing that confused me for a while was that I got crashes on some properties after deserializing, with the backing members set to null - couldn't figure out how this was possible since the backing members were definitely initialized in all possible constructors. Debugging showed constructors were never called, and after some googling I found this SO post with an explanation.
You could try binary serialization (BinarySerializer or DataContractSerializer), which I think handles cyclic graphs somewhat better, at the cost of not having a human-readable representation of the data. Alternatively, you can try the SoapFormatter as detailed here.
Use DataContractSerializer. Mark your classes with [DataContract(IsReference = true)].

How can I implement partial serialization?

I've done a lot of serialization development lately, mostly for sending objects over sockets, but I've run into an interesting question: Is it possible to send just a few of the properties from an object through a serializer?
My envisioned scenario is this: You have some sort of "state" object for each client, consisting of many properties (strings, ints, bools, etc). When your client first connects, the entire state object is serialized via an Xml or Binary serializer, and sent over the socket, to be recreated on the other side. Now both client and server have identical state objects. Your server then needs to change the state, and does so by simply setting one of the state object's property. The socket (either hooked to the state's events, or part of the state object itself) could synchronize the two states by reserializing the entire object, but it seems like a single "property change" object would do.
Obviously, this could be implemented manually. But it seems like a serializer should be able to serialize just a single property, and apply it like a patch on the other side. Does anyone know if this is possible, or would I have to write the entire thing from scratch?
With XmlSerializer (and protobuf-net, for a binary equivalent, since protobuf-net adopts most of XmlSerializer's patterns) you could do this by having a method:
public bool SouldSerializeFoo() {
return fooIsDirty;
}
public string Foo {get;set;}
for each property Foo - but you'd need to maintain the "what is dirty" manually in your own code (perhaps in the set). Lots of work; I've done a diffing serializer in the past - it was a real PITA, to be honest. I should also note that the [XmlIgnore] public bool FooSpecified {get{...} set{...}} pattern does the same thing, but for what you want, ShouldSerialize* is more appropriate.
As an addition to Marc's answer, here's the MSDN docs on the ShouldSerialize* methods

How to design to prompt users for new values for properties of deserialized objects?

Right now, I'm currently serializing a class like this:
class Session
{
String setting1;
String setting2;
...etc... (other member variables)
List<SessionAction> actionsPerformed;
}
Where SessionAction is an interface that just has one method. All implementations of the SessionAction interface have various properties describing what that specific SessionAction does.
Currently, I serialize this to a file which can be loaded again using the default .Net binary serializer. Now, I want to serialize this to a template. This template will just be the List of SessionActions serialized to a file, but upon loading it back into memory at another time, I want some properties of these SessionActions to require input from the user (which I plan to dynamically generate GUI controls on the fly depending on the property type). Right now, I'm stuck on determining the best way to do this.
Is there some way I could flag some properties so that upon using reflection, I could determine which properties need input from user? Or what are my other options? Feel free to leave comments if anything isn't clear.
For info, I don't recommend using BinaryFormatter for anything that you are storing long-term; it is very brittle between versions. It is fine for short-lived messages where you know the same version will be used for serialization and deserialization.
I would recommend any of: XmlSerializer, DataContractSerializer (3.0), or for fast binary, protobuf-net; all of these are contract-based, so much more version tolerant.
Re the question; you could use things like Nullable<T> for value-types, and null for strings etc - and ask for input for those that are null? There are other routes involving things like the ShouldSerialize* pattern, but this might upset the serialization APIs.
If you know from start what properties will have that SessionAction, you must implement IDeserializationCallback and put to those props the attribute [NonSerialized]. When you implement the OnDeserialization method you get the new values from the user.

Using JSON serialization as a persistence mechanism instead of RDB

I'm thinking about throwing away my DB in my next project to simplify development/evolution.
One way to do it is not to leave the Objects realm at all and persist my objects by some kind of serialization. It would be nice to be able to edit the initial object state when the app is down, so a format like JSON would be great.
The problem is that JSON tools (like Java Jackson), or rather JSON itself, are not able to keep the references, so after deserializing object graph I can get more instances than I got before serialization - each reference to the same object gets new instance.
I've noticed JSPON but it doesn't seem to be alive.
What do you think about such approach - isn't it too simple to be possible? Or maybe I should use some OODB (although it would create additional configurational overhead and I want to keep it simple).
Most of the simple portable serializers (xml, json, protocol buffers) are tree serializers (not graph serializers), so you'll see this problem a bit...
You could perhaps try using a DTO tree that doesn't need the references? i.e. instead of:
Parent -(children)-> Child
<--(parent)--
you have (at the DTO level):
Parent {Key="abc"} -(child keys)-> {string}
Child {Key="def"} -(parent key)-> {string}
This should be usable with any tree serializer; but it does require extra (manual) processing.
There are graph-based serializers like .NET's DataContractSerializer (with graph-mode enabled; it is disabled by default); but this is non-portable.
The references issue should be simple enough to solve assuming you control the serialization - you'd simply save the objects giving each an id and then save the references in terms of those ids.
However, while I think you'd get a simple version working and I reckon you'd run into problems down the line. Things that spring to mind are:
What would happen as the code evolves and the classes change?
How would you support query operations, particularly indexing to make the queries fast?
How would you manage concurrent access?
How would you manage transactions?
How would it scale?
I don't think these problems are insurmountable but IMHO, relational databases are the way they are based on years of development and use in the wild and the OODBs that I've seen are not a realistic proposition at this time.
Also, there's a whole class of problem that the set based logic provided by relational databases is ideal for, let alone the power of SQL in refining the data-sets you load, which just isn't as easy in the object world. With the modern ORMs making life so easy these days I certainly would want to confine myself to either realm.
The latest version of Json.NET supports serializing references.
string json = JsonConvert.SerializeObject(people, Formatting.Indented,
new JsonSerializerSettings { PreserveReferencesHandling = PreserveReferencesHandling.Objects });
//[
// {
// "$id": "1",
// "Name": "James",
// "BirthDate": "\/Date(346377600000)\/",
// "LastModified": "\/Date(1235134761000)\/"
// },
// {
// "$ref": "1"
// }
//]
List<Person> deserializedPeople = JsonConvert.DeserializeObject<List<Person>>(json,
new JsonSerializerSettings { PreserveReferencesHandling = PreserveReferencesHandling.Objects });
Console.WriteLine(deserializedPeople.Count);
// 2
Person p1 = deserializedPeople[0];
Person p2 = deserializedPeople[1];
Console.WriteLine(p1.Name);
// James
Console.WriteLine(p2.Name);
// James
bool equal = Object.ReferenceEquals(p1, p2);
// true
I've found this SO question helpful. XStream seems to cope with references by using relative paths in tree structure to the first reference when finding next one, even for json ( see here ).
Simple apparently can deal with more complex object graphs, but XStream seems more popular, does JSON and will probably suit my needs (I won't be ever having cyclic references).
the itemscript project proposes a schema language based on JSON. itemscript schema describes data types. itemscript JAM is an application markup developed in itemscript.
the reference implementation includes a GWT client (Item Lens) and a column store (Item Store) that persists JSON data.
As long as you're not leaving .NET realm, why not use the serialization mechanisms that .NET offers? It can easily serialize your object graphs (private fields included) to a binary blob and back again. There is also a built-in mechanism for serializing to and from XML, although that has some limitations when it comes to private fields and stuff (you can work around that though). There are also attributes that specify that some fields are newer and may not be in the serialized stream. You'll have to deal with them being NULL yourself, but then - you'd have to do that anyway.
Added: Ah, yes, forgot to mention - I'm talking about System.Runtime.Serialization.Formatters.Binary.BinaryFormatter class.

Categories