I have a class that stores and manipulates some entities. Depending on the number of inputs, I may not be able to store entities in memory, so I'm trying to serialize my objects to be written on hard disk using protocol buffers. I'm using C# and protobuf-csharp-port. I'm aware of protobuf-net as an alternative port; so far I have been working with first option but I'm open for changes if it's required based on my needs.
The class to be serialized in it's simplified form is as follows:
class Entity<T> where T: IComparable<T>
{
int id;
T metaData;
}
So at compile time I have no clues about metaData. Googling I noticed that extensions are the right path to follow (as suggested on google's page and this question); hence I'm defining the Entity.proto file for class Entity as following:
message Entity
{
required int32 id = 1 [default = 0];
extensions 2 to max;
}
and I would like the user to provide his own .proto file for T without the need to access or re-compile Entity.proto. In this regard, my questions are:
Do I need to change Entity.proto ?
What should be the T.proto ?
How can I access T in my C# code ?
With that scheme, any extensions are going to be child values (not subclasses) of the non-generic Entity. That doesn't sound like generics, but ultimately storage (serialization) is often quite different to implementation (Entity<T> etc). If you can manuallyap between them: fine. But it isn't something the library will provide, AFAIK.
For completeness, in protobuf-net terms: it is perfectly fine with Entity<T> - it essentially considers each (Entity<Foo>, Entity<Bar>, etc) to be completely separate messages. Protobuf-net isn't hugely motivated by .proto schemas (although a code-gen tool is provided, for completeness) - it mainly uses runtime metadata.
Related
Back story:
So I've been stuck on an architecture problem for the past couple of nights on a refactor I've been toying with. Nothing important, but it's been bothering me. It's actually an exercise in DRY, and an attempt to take it to such an extreme as the DAL architecture is completely DRY. It's a completely philosophical/theoretical exercise.
The code is based in part on one of #JohnMacIntyre's refactorings which I recently convinced him to blog about at http://whileicompile.wordpress.com/2010/08/24/my-clean-code-experience-no-1/. I've modified the code slightly, as I tend to, in order to take the code one level further - usually, just to see what extra mileage I can get out of a concept... anyway, my reasons are largely irrelevant.
Part of my data access layer is based on the following architecture:
abstract public class AppCommandBase : IDisposable { }
This contains basic stuff, like creation of a command object and cleanup after the AppCommand is disposed of. All of my command base objects derive from this.
abstract public class ReadCommandBase<T, ResultT> : AppCommandBase
This contains basic stuff that affects all read-commands - specifically in this case, reading data from tables and views. No editing, no updating, no saving.
abstract public class ReadItemCommandBase<T, FilterT> : ReadCommandBase<T, T> { }
This contains some more basic generic stuff - like definition of methods that will be required to read a single item from a table in the database, where the table name, key field name and field list names are defined as required abstract properties (to be defined by the derived class.
public class MyTableReadItemCommand : ReadItemCommandBase<MyTableClass, Int?> { }
This contains specific properties that define my table name, the list of fields from the table or view, the name of the key field, a method to parse the data out of the IDataReader row into my business object and a method that initiates the whole process.
Now, I also have this structure for my ReadList...
abstract public ReadListCommandBase<T> : ReadCommandBase<T, IEnumerable<T>> { }
public class MyTableReadListCommand : ReadListCommandBase<MyTableClass> { }
The difference being that the List classes contain properties that pertain to list generation (i.e. PageStart, PageSize, Sort and returns an IEnumerable) vs. return of a single DataObject (which just requires a filter that identifies a unique record).
Problem:
I'm hating that I've got a bunch of properties in my MyTableReadListCommand class that are identical in my MyTableReadItemCommand class. I've thought about moving them to a helper class, but while that may centralize the member contents in one place, I'll still have identical members in each of the classes, that instead point to the helper class, which I still dislike.
My first thought was dual inheritance would solve this nicely, even though I agree that dual inheritance is usually a code smell - but it would solve this issue very elegantly. So, given that .NET doesn't support dual inheritance, where do I go from here?
Perhaps a different refactor would be more suitable... but I'm having trouble wrapping my head around how to sidestep this problem.
If anyone needs a full code base to see what I'm harping on about, I've got a prototype solution on my DropBox at http://dl.dropbox.com/u/3029830/Prototypes/Prototype%20-%20DAL%20Refactor.zip. The code in question is in the DataAccessLayer project.
P.S. This isn't part of an ongoing active project, it's more a refactor puzzle for my own amusement.
Thanks in advance folks, I appreciate it.
Separate the result processing from the data retrieval. Your inheritance hierarchy is already more than deep enough at ReadCommandBase.
Define an interface IDatabaseResultParser. Implement ItemDatabaseResultParser and ListDatabaseResultParser, both with a constructor parameter of type ReadCommandBase ( and maybe convert that to an interface too ).
When you call IDatabaseResultParser.Value() it executes the command, parses the results and returns a result of type T.
Your commands focus on retrieving the data from the database and returning them as tuples of some description ( actual Tuples or and array of arrays etc etc ), your parser focuses on converting the tuples into objects of whatever type you need. See NHibernates IResultTransformer for an idea of how this can work (and it's probably a better name than IDatabaseResultParser too).
Favor composition over inheritance.
Having looked at the sample I'll go even further...
Throw away AppCommandBase - it adds no value to your inheritance hierarchy as all it does is check that the connection is not null and open and creates a command.
Separate query building from query execution and result parsing - now you can greatly simplify the query execution implementation as it is either a read operation that returns an enumeration of tuples or a write operation that returns the number of rows affected.
Your query builder could all be wrapped up in one class to include paging / sorting / filtering, however it may be easier to build some form of limited structure around these so you can separate paging and sorting and filtering. If I was doing this I wouldn't bother building the queries, I would simply write the sql inside an object that allowed me to pass in some parameters ( effectively stored procedures in c# ).
So now you have IDatabaseQuery / IDatabaseCommand / IResultTransformer and almost no inheritance =)
I think the short answer is that, in a system where multiple inheritance has been outlawed "for your protection", strategy/delegation is the direct substitute. Yes, you still end up with some parallel structure, such as the property for the delegate object. But it is minimized as much as possible within the confines of the language.
But lets step back from the simple answer and take a wide view....
Another big alternative is to refactor the larger design structure such that you inherently avoid this situation where a given class consists of the composite of behaviors of multiple "sibling" or "cousin" classes above it in the inheritance tree. To put it more concisely, refactor to an inheritance chain rather than an inheritance tree. This is easier said than done. It usually requires abstracting very different pieces of functionality.
The challenge you'll have in taking this tack that I'm recommending is that you've already made a concession in your design: You're optimizing for different SQL in the "item" and "list" cases. Preserving this as is will get in your way no matter what, because you've given them equal billing, so they must by necessity be siblings. So I would say that your first step in trying to get out of this "local maximum" of design elegance would be to roll back that optimization and treat the single item as what it truly is: a special case of a list, with just one element. You can always try to re-introduce an optimization for single items again later. But wait till you've addressed the elegance issue that is vexing you at the moment.
But you have to acknowledge that any optimization for anything other than the elegance of your C# code is going to put a roadblock in the way of design elegance for the C# code. This trade-off, just like the "memory-space" conjugate of algorithm design, is fundamental to the very nature of programming.
As is mentioned by Kirk, this is the delegation pattern. When I do this, I usually construct an interface that is implemented by the delegator and the delegated class. This reduces the perceived code smell, at least for me.
I think the simple answer is... Since .NET doesn't support Multiple Inheritence, there is always going to be some repetition when creating objects of a similar type. .NET simply does not give you the tools to re-use some classes in a way that would facilitate perfect DRY.
The not-so-simple answer is that you could use code generation tools, instrumentation, code dom, and other techniques to inject the objects you want into the classes you want. It still creates duplication in memory, but it would simplify the source code (at the cost of added complexity in your code injection framework).
This may seem unsatisfying like the other solutions, however if you think about it, that's really what languages that support MI are doing behind the scenes, hooking up delegation systems that you can't see in source code.
The question comes down to, how much effort are you willing to put into making your source code simple. Think about that, it's rather profound.
I haven't looked deeply at your scenario, but I have some thoughs on the dual-hierarchy problem in C#. To share code in a dual-hierarchy, we need a different construct in the language: either a mixin, a trait (pdf) (C# research -pdf) or a role (as in perl 6). C# makes it very easy to share code with inheritance (which is not the right operator for code-reuse), and very laborious to share code via composition (you know, you have to write all that delegation code by hand).
There are ways to get a kind of mixin in C#, but it's not ideal.
The Oxygene (download) language (an Object Pascal for .NET) also has an interesting feature for interface delegation that can be used to create all that delegating code for you.
SOME CONTEXT
one of my projects requires carrying around some of "metadata" (yes I hate using that word).
What the metadata specifically consists of is not important, only that it's more complex than a simple "table" or "list" - you could think of it as a mini-database of information
Currently I have this metadata stored in an XML file and have an XSD that defines the schema.
I want to package this metadata with my project, currently that means keeping the XML file as a resource
However, I have been looking for a more strongly-typed alternative. I am considering moving it from an XML file to C# code - so instead of using XML apis to traverse my metadata, relying on .NET code via reflection on types
Besides the strong(er) typing, some useful characteristics I see from using an assembly are for this: (1) I can refactor the "schema" to some extent with tools like Resharper, (2) The metadata can come with code, (3) don't have to rely on any external DB component.
THE QUESTIONS
If you have tried something like this, I am curious about what you learned.
Was your experience positive?
What did you learn?
What problems in this approach did you uncover?
What are some considerations I should take into account?
Would you do this again?
NOTES
Am not asking for how to use Reflection - no help is needed there
Am fundamentally asking about your experiences and design considerations
UPDATE: INFORMATION ABOUT THE METADATA
Because people are asking I'll try describing the metadata a bit more. I'm trying to abstract a bit - so this will seem a bit artificial.
There are three entities in the model:
A set of "groups" - each group has a unique name and several properites (usually int values that represent ID numbers of some kind)
Each "group" contains 1 or more "widgets" (never more than 50) - each item has properties like name (therea are multiple names), IDs, and various boolean properties.
Each widget contains a one or more "scenarios". Each "scenario" is documentation- a URL to a description of how to use the widget.
Typically I need to run these kinds of "queries"
Get the names of all the widgets
Get the names of all groups that contain at least one widget where BoolProp1=true
Get given the ID of a widget, which group contains that widget
How I was thinking about modelling the entities in the assembly
There are 3 classes: Group, Widget, Documentation
There are 25 Groups so I will have 25 Group classes - so "FooGroup" will derive from Group, same pattern follows for widgets and documentation
Each class will have attributes to account for names, ids, etc.
I have used and extended Metadata for a large part of my projects, many of them related to describing components, relationships among them, mappings, etc.
(Major categories of using attributes extensively include O/R Mappers, Dependency Injection framework, and Serialization description - specially XML Serialization)
Well, I'm going to ask you to describe a little bit more about the nature of the data you want to embed as resource. Using attributes are naturally good for the type of data that describes your types and type elements, but each usage of attributes is a simple and short one. Attributes (I think) should be very cohesive and somehow independent from each other.
One of the solutions that I want to point you at, is the "XML Serialization" approach. You can keep your current XMLs, and put them into your assemblies as Embedded Resource (which is what you've probably done already) and read the whole XML at once into a strongly-typed hierarchy of objects.
XML Serialization is very very simple to use, much simpler than the typical XML API or even LINQ2XML, in my opinion. It uses Attributes to map class properties to XML elements and XML attributes. Once you've loaded the XML into the objects, you have everything you want in the memory as "typed" data.
Based on what I understand from your description, I think you have a lot of data to be placed on a single class. This means a large and (in my opinion) ugly attribute code above the class. (Unless you can distribute your data among members making each of them small and independent, which is nice.)
I have many positive experiences using XML Serialization for large amount of data. You can arrange data as you want, you get type safety, you get IntelliSence (if you give your XSD to visual studio), and you also get half of the Refactoring. ReSharper (or any other refactoring tool that I know of) don't recognize XML Serialization, so when you refactor your typed classes, it doesn't change the XML itself, but changes all the usage of the data.
If you give me more details on what your data is, I might be able to add something to my answer.
For XML Serialization samples, just Google "XML Serialization" or look it up in MSDN.
UPDATE
I strongly recommend NOT using classes for representing instances of your data. Or even using a class to encapsulate data is against its logical definition.
I guess your best bet would be XML Serialization, provided that you already have your data in XML. You get all the benefits you want, with less code. And you can perform any query on the XML Serializable objects using LINQ2Objects.
A part of your code can look like the following:
[XmlRoot]
public class MyMetadata
{
[XmlElement]
public Group[] Groups { get; set; }
}
public class Group
{
[XmlAttribute]
public string Name { get; set; }
[XmlAttribute]
public int SomeNumber { get; set; }
[XmlElement]
public Widget[] Widgets { get; set; }
}
public class Widget
{
...
}
You should call new XmlSerializer(typeof(MyMetadata)) to create a serializer, and call its Deserialize method giving it the stream of your XML, and you get a filled instance of MyMetadata class.
It's not clear from your description but it sounds like you have assembly-level metadata that you want to be able to access (as opposed to type-level). You could have a single class in each assembly that implements a common interface, then use reflection to hunt down that class and instantiate it. Then you can hard-code the metadata within.
The problems of course are the benefits that you lose from the XML -- namely that you can't modify the metadata without a new build. But if you're going this direction you probably have already taken that into account.
Right now, I'm currently serializing a class like this:
class Session
{
String setting1;
String setting2;
...etc... (other member variables)
List<SessionAction> actionsPerformed;
}
Where SessionAction is an interface that just has one method. All implementations of the SessionAction interface have various properties describing what that specific SessionAction does.
Currently, I serialize this to a file which can be loaded again using the default .Net binary serializer. Now, I want to serialize this to a template. This template will just be the List of SessionActions serialized to a file, but upon loading it back into memory at another time, I want some properties of these SessionActions to require input from the user (which I plan to dynamically generate GUI controls on the fly depending on the property type). Right now, I'm stuck on determining the best way to do this.
Is there some way I could flag some properties so that upon using reflection, I could determine which properties need input from user? Or what are my other options? Feel free to leave comments if anything isn't clear.
For info, I don't recommend using BinaryFormatter for anything that you are storing long-term; it is very brittle between versions. It is fine for short-lived messages where you know the same version will be used for serialization and deserialization.
I would recommend any of: XmlSerializer, DataContractSerializer (3.0), or for fast binary, protobuf-net; all of these are contract-based, so much more version tolerant.
Re the question; you could use things like Nullable<T> for value-types, and null for strings etc - and ask for input for those that are null? There are other routes involving things like the ShouldSerialize* pattern, but this might upset the serialization APIs.
If you know from start what properties will have that SessionAction, you must implement IDeserializationCallback and put to those props the attribute [NonSerialized]. When you implement the OnDeserialization method you get the new values from the user.
I am attempting to optimise around a possible bottleneck.
I have a server application that is serving objects from a database to applications remotely, who can work with 1 - n objects of 1 - n different types (where n can be a relatively high number) that all implement a common interface but may contain many unique properties on different types.
The client applications store the server objects in a local cache, until they are ready to persist them back, through the server, to the database.
This is being done currently in WCF with each class defining a DataContract.
Due to the possibly large amount of objects that may need to be passed back to the server (it changes depending on implementation), I would prefer not to do these all as individual calls any more, rather wrap all the objects in a single serialized (or better still compressed) stream and send them through as one connection to the server.
I can quite simply roll my own, but would prefer to use a recommended approach, and hoping someone may suggest one. If you can convince me, I am also willing to accept that my approach is probably not the best idea.
How high is "relatively high"?
For example, one option that occurs is to use a wrapper object:
[DataContract]
public class Wrapper {
[DataMember(Order = 1)]
public List<Foo> Foos {get {...}}
[DataMember(Order = 2)]
public List<Bar> Bars {get {...}}
[DataMember(Order = 3)]
public List<Blop> Blops {get {...}}
}
Then you should be able to send a single message with any number of Foo, Bar and/or Blop records. My inclusion of the Order attribute was deliberate - if you want to reduce the size of the stream, you might consider protobuf-net - with the above layout, protobuf-net can hook into WCF simply by including [ProtoBehavior] on the method (in the operation-contract interface) that you want to attack (at both client and server). This switches the transfer to use google's "protocol buffers" binary format, using base-64 to encode. If you are using the basic-http binding, this can also use MTOM if enabled, so even the base-64 isn't an issue. Using this, you can get significant data transfer savings (~1/5th the space based on the numbers shown).
(edit 1 - protobuf-net assumes that Foo, Bar and Blop also use the Order attribute)
(edit 2 - note that you could always break up the request into a number of mid-size Wrapper messages, then call a method to apply all the changes once you have them at the server (presumably in a staging table in the database))
I am currently saving a .net ( c# ) usercontrol to the disk as a XML file by saving each property as an element in the xml document. The file is used for later recreation of the controls at runtime. I am wondering if it is possible or better to save the control as a binary file. There would be many controls so I guess it would have to have a header section describing the location and length of each saved controls. Thoughts?
Brad
BTW this is a windows app
EDIT:
what I currently have inplace is a public member function that uses the propertyDescriptior class to itinerate through all the properties and create an xml document from that.
PropertyDescriptorCollection pdc = TypeDescriptor.GetProperties(this);
for (int i = 0; i <= pdc.Count - 1; i++)
{ pdc[i].Name
pdc[i].PropertyType
pdc[i].Category
}
I will look into creating the class Serializable - thanks
We had to do this for a data-driven application where a user could create persistable views. We did an XML version to start but moved to using BinaryFormatter and the ISerializable interface as this allows us to control exactly what gets persisted and which constructors to use. For the controls we actually persisted the CodeCompileUnit that the designer has created, but that means you have to actually use a designer to lay them out.
Winforms controls don't serialize especially well, and you might have a lot of difficulty getting the base-classes (i.e. not your code) to play ball. Things like Color, for example, regularly provide surprisingly troublesome to serialize.
Xml would be an obvious (if somewhat predictable) choice, but you generally need to nominate sub-classes ahead of time. And of course, the base-classes won't be marked serializable. BinaryFormatter would avoid some of that, but as a field-based serializer, you'd have problems with the "handles" etc in the base-classes, which are meaningless serialized.
I'm not saying it can't be done - but it won't be trivial either. As a starter, you'd want to look at TypeConverter.GetProperties, and use the Converter of each to get the value as an invariant string.
just had a thought: maybe you dont need to serialize the base/sub-classes. maybe you could write another searializer that only serializes the top tier of the class inheritance hierarchy? only serializing your classes (you wrote) and maybe storing meta-data for the base classes you may derive from (so that you can re-map this on de-serialization)?? this could just be pie-in-the-sky too.