Deserialization validation - c#

I'm working with a list of fonts that I serialize and deserialize using DataContractSerializer. In between the two steps, it's conceivable that the user has removed a font from their machine. I'd like to check a font name as it's being deserialized to ensure that it still exists on the system. If it doesn't exist, that element is not included in the collection returned by DataContractSerializer.ReadObject().
Specifically, I'm storing a FontFamily and serializing a property that gets FontFamily.Name. In this property's set accessor, I convert the string back into a FontFamily.
The only reasonable alternative to validation that I can think of would be having the property's set accessor ignore invalid values, and filter out the invalid deserialized objects later. I don't like this option, however - is there a more proper way?

Why not take advantage on the OnDeserializedAttribute? Have your callback do the validation and removal of items that are not valid for the client environment.
http://msdn.microsoft.com/en-us/library/ms733734.aspx
I do have some concerns about how you would go about round tripping the data if you remove or modify the data under the covers.
(For example: I remember being particularly frustrated by older versions of MS Publisher as I was working on a document on two different machines hooked up to two different printers. Whenever I modified the file on one machine, Publisher would be reformat the document to target the printer attached to that machine. When I went back to the other machine where I was going to do the actual printing, Publisher would reformat again, but the margins would not be quite right and so I needed to tweak some more.)

You could also implement IXmlSerializable for you class, which would include your own implementation of ReadXml, allowing you to do whatever validation you wanted as the object was being deserialized.

Related

When using protobuf-net, how do I know what fields will be updated (or have been updated) when using merge on an existing object

Using Protobuf-net, I want to know what properties of an object have been updated at the end of a merge operation so that I can notify interested code to update other components that may relate to those updated properties.
I noticed that there are a few different types of properties/methods I can add which will help me serialize selectively (Specified and ShouldSerialize). I noticed in MemberSpecifiedDecorator that the ‘read’ method will set the specified property to true when it reads. However, even if I add specified properties for each field, I’d have to check each one (and update code when new properties were added)
My current plan is to create a custom SerializationContext.context object, and then detect that during the desearalization process – and update a list of members. However… there are quite a few places in the code I need to touch to do that, and I’d rather do it using an existing system if possible.
It is much more desirable to get a list of updated member information. I realize that due to walking down an object graph that may result in many members, but in my use case I’m not merging complex objects, just simple POCO’s with value type properties.
Getting a delta log isn't an inbuilt feature, partly because of the complexity when it comes to complex models, as you note. The Specified trick would work, although this isn't the purpose it was designed for - but to avoid adding complexity to your own code,that would be something best handled via reflection, perhaps using the Expression API for performance. Another approach might be to use a ProtoReader to know in advance which fields will be touched, but that demands an understanding of the field-number/member map (which can be queried via RuntimeTypeModel).
Are you using habd-crafted models? Or are you using protogen? Yet another option would be to have code in the setters that logs changes somewhere. I don't think protogen currently emits partial method hooks, but it possibly could.
But let me turn this around: it isn't a feature that is built in right now, and it is somewhat limited due to complexity anyway, but: what would a "good" API for this look like to you?
As a side note: this isn't really a common features in serializers - you'd have very similar challenges in any mainstream serializer that I can think of.

Best way to update a database's serialized objects after changing the object form

I often write my objects out to a database in xml form.
However, if I change the form of my objects, say by changing the name or by changing the fields, I can no longer read them from the database, which makes somewhat difficult the task of reading them, converting them to their new form, and writing them back out to the database.
I'd rather not have to rename my classes everytime I change something about them.
*Note: I am relying on C#'s XmlSerialization/Deserialization of objects for generating the Xml. Perhaps this is not desirable if I change the format of the objects.
If you implement the ISerializable interface on your objects, then you can implement custom serialization/deserialization routines that provide backwards compatibility with older versions of the objects.
See here for an example of how it can be done: ISerializable and backward compatibility
It ultimately depends on how you serialize your objects.
One convenient option is to store them as hash (key-value pairs). This way, if to class Dog having property name I add another property breed, existing objects won't be invalidated. They'll just have have breed = nil.
Exact storage format (xml, json or separate 'properties' table) is not important in this case, the important thing is how you convert objects to it and back.
But I don't think anyone could give specific suggestions without knowing specific platform.

a question about serialization

What is a better approach to serialize custom class: using XMLSerializer or BinarryFormatter and [Serializable] attribute on class?
It's not possible to answer this, without knowing how you will use the resulting file, and the lifetime of it.
The decision is based on the fact that it is harder to "upgrade" the binary format. If your object model changes, it won't deserialise correctly. But if you've implemented a custom XML serialisation/deserialisation, then you can handle the "new" cases appropriately, and life will be good.
So decide more about how you will use it, who you are sharing information with, and what the possible changes to the model are.
FWIW, I sometimes use both types of serialisation in a given project.
That really depends on how you use the serialized class. If you want to pass it to other programs or want to easily debug it, use XML (but mind that XMLSerializer might produce non-compliant XML output, like multiple root elements).
In all other cases, you can use the binary formatter. But note that XML is more suitable if you change the class later - you can use XMLIgnore and the like to keep the XML format intact.
The decision will sometimes also be made for you based on what the serialized output will be used for - while you could expose a WebService to take a binary array that is a binary serialized item, you couldn't utilize the web service easily from anything but .Net (and the end client would probably need a reference to the type).
Using XML means that the service could be exposed to any end client regardless of the platform/environment on the end client

Object Dumper/Unit Testing

I'm currently writing an object dumper (allowing different dumping strategies).
Of course, I would like to write unit tests to verify that what I will develop will match all the features that I expect.
However, I cannot imagine how I will perform the testing on this solution.
I have thought about creating a set of objects which save the number of times that each of their properties have been accessed. It seems almost ok. But how can I verify that their public fields have been accessed?
Why would you explicitly care how many times the properties have been accessed etc? I'd just test that the output matched expectations. If there's some reason to have one particular strategy (e.g. fields instead of properties) then there's likely to be an easy way of testing that (e.g. make the property return a capitalized version).
I would focus on validating the output rather than verifying properties were accessed. I might read a property but not dump it correctly, right?
This is an example of testing the outcome rather than testing the implementation.
You just have to test the value dumped is the value that was assigned to the properties/public field. Just ensure to assign a different value to each property/field.

How would you reference lookup/meta data?

I've run into this issue quite a few times and never liked the solution chosen. Let's say you have a list of States (just as a simple example) in the database. In your code-behind, you want to be able to reference a State by ID and have the list of them available via Intellisense.
For example:
States.Arizona.Id //returns a GUID
But the problem is that I don't want to hard-code the GUIDS. Now in the past I've done all of the following:
Create class constants (hard-coding of the worst kind.. ugh!)
Create Lookup classes that have an ID property (among others) (still hard-coded and would require a rebuild of the project if ever updated)
Put all the GUIDS into the .config file, create an enumeration, and within a static constructor load the GUIDS from the .config into a Hashtable with the enumeration item as the key. So then I can do: StateHash[StatEnum.Arizona]. Nice, because if a GUID changes, no rebuild required. However, doesn't help if a new record is added or an old one removed, because the enumeration will need to be updated.
So what I'm asking is if someone has a better solution? Ideally, I'd want to be able to look up via Intellisense and not have to rebuild code when there's an update. Not even sure that's possible.
EDIT: Using states was just an example (probably a bad one). It could be a list of widgets, car types, etc. if that helps.
Personally, I would store lookup data in a database, and simply try to avoid the type of hard coding that binds rules to things like individual states. Perhaps some key property of those states (like .ApplyDoubleTax or something). And non-logic code doesn't need to use intellisense - it typically just needs to list them or find by name, which can be done easily enough however you have stored it.
Equally, I'd load the data once and cache it.
Arguably, coding the logic against states is hard coding - especially if you want to go international anytime soon - I hate it when a site asks me what state I live in...
Re the data changing... is the USA looking to annex anytime soon?
I believe that if it shows up in Intellisense, then, by definition, it is hard-coded into your program.
That said, if your goal is make the hard-coding as painless as possible, on thing you might try is auto-generating your enumeration based on what's in the database. That is, you can write a program that reads the database and creates a FOO.cs file containing your enumeration. Then just run that program every time the data changes.
This cries out for a custom MSBuild task. You really want an autogenerated enum or class in this case; if the IDs are sourced from a database and can/will change, and are not easily predicted. You could then put the task in your project and it would run before each build updating as necessary.
Or start looking at ORMs :)

Categories