XML Serialization/Deserialization per item

XML Serialization/Deserialization per item - c#

I'm designing a car catalogue and need to use XML files for storage. In previous projecs, I was manually editing XML files with Linq. However, I came across XML serialization and am thinking this might be a better approach. Each item in the catalogue would be of type CarItem and contain various attributes. The catalogue can contain a few hundred cars and I'm worried about performance. If I deserialize the XML file, will all the CarItems be instantiated straight away? Is there a way for me to be able to choose which object gets deserialized based on parameters? For example, I'd like to say "if car color attribute is red, then only deserialize red CarItems into objects".
Thanks for any suggestions

There's quite a few posts with good examples of how you can control what you pull out and instantiate into objects/scalars using XDocument.
Shawn Oster's post in this thread I believe is quite close to what you want using linq. You can add where clauses to suit your requirements easily.

Yes, they all will be instantiated. However, few hundred objects is not a big deal for a class with some simple fields. Give it a try and check performance.

Related

Data structure for hierarchical members in C#

I'm trying to read data from WSDL file and get stuck, because there could be a big hierarchical tree and I don't know what kind data structure use to get inputs and outputs, because they can have input as a object and object can point to couple simple inputs and second object... this could go on and on. So I don't know what to use. Maybe tree, maybe indexes. What is the best practise and can you give small example how data could be controlled?
P.S. I'm developing automated tests generation tool, whose will use WSDL files for generation.

Your best bet is to use good old classes. First thing to do is to use utility like svcutils.exe (Code generator tool) to create the client code from WSDL. Form this you will get the idea about how deep the tree is going to be.
Once you have Object View of the structure then start creating Classes and apply OOP design patterns. This will help with at least two things:
Avoiding code duplication and
When you start constructing your object in the code it will give you idea which node comes under which parent etc.
Hope this helps.
Another thing also to consider is use some sort of object serialization meach. Serialization will help you in great deal when dealing with complex tree like data from XML to objects and vice a versa.

WSDL is based on XML, which already is a tree structure. Not sure why you want to read it into objects first -- just use Linq to XML to read the WSDL directly.

Retrieving data LINQ vs Reflection

I was hoping someone could tell me which is the more efficient and/or correct way to retrieve some data.
I have some XML files coming from a 3rd party and their attached DTDs. So I've converted the DTD into a C# Class so I can deserialize the XML into the classes. I now need to map that data to match the way my data structures are set up.
The question ultimately is; should I use reflection or LINQ. The format of the XML is somewhat generic by design, where things are held in Items [Array] or Item [Object].
I've done the following:
TheirClass class = theirMessage.Items.Where(n=> n.GetType() == typeof(TheirClass)).First() as TheirClass;
MyObject.Param1 = ConversionHelperClass.Convert(class.Obj1);
MyObject.Param2 = ConversionHelperClass.Convert(class.Obj2);
I can also do some stuff with Reflection where I pass in the names of the Classes and Attributes I'm trying to snag.
Trying to do things the right way here.

As a general rule I'd suggest avoiding reflection unless it is absolutely necessary! It introduces a performance overhead AND means that you miss out on all of the lovely compile time checks that the compiler team have worked so hard to give us.
Linq to entities essentially queries against an in memory data set, so it can be very fast.
If your ultimate goal is to parse information from an xml document, I'd suggest checking out the XDocument class. It provides a very nice abstraction for querying xml documents.

What is the best approach to generalize and aggregate XML dumps in C#?

Here is the business part of the issue:
Several different companies send a
XML dump of the information to be
processed.
The information sent by the companies
are similar ... not exactly same.
Several more companies would be soon
enlisted and would start sending
information
Now, the technical part of the problem is I want to write a generic solution in C# to accommodate this information for processing. I would be transforming the XML in my C# class(es) to fit in to my database model.
Is there any pattern or solution for this issue to be handled generically without needing to change my solution in case of addition of many companies later?
What would be the best approach to write my parser/transformer?

This is how I have done something similar in the past.
As long as each company has its own fixed format which they use for their XML dump,
Have an specific XSLT for each company.
Have a way of indicating which dump is sourced from where (maybe different DUMP folders for each company )
In your program, based on 2, select 1 and apply it to the DUMP
All the XSLT's will transform the XML to your one standard database schema
Save this to your DB
Each new company addition is at the most a new XSLT
In cases where the schema is very similar, the XSLT's can be just re-used and then specific changes made to them.
Drawback to this approach: Debugging XSLT's can be a bit more painful if you do not have the right tools. However a LOT of XML Editors (eg XML Spy etc) have excellent XSLT debugging capabilities.

Sounds to me like you are just asking for a design pattern (or set of patterns) that you could use to do this in a generic, future-proof manner, right?
Ideally some of the attributes that you probably want
Each "transformer" is decoupled from one another.
You can easily add new "transformers" without having to rewrite your main "driver" routine.
You don't need to recompile / redeploy your entire solution every time you modify a transformer, or at least add a new one.
Each "transformer" should ideally implement a common interface that your driver routine knows about - call it IXmlTransformer. The responsibility of this interface is to take in an XML file and to return whatever object model / dataset that you use to save to the database. Each of your transformers would implement this interface. For common logic that is shared by all transformers you could either create a based class that all inherit from, or (my preferred choice) have a set of helper methods which you can call from any of them.
I would start by using a Factory to create each "transformer" from your main driver routine. The factory could use reflection to interrogate all assemblies it can see that, or something like MEF which could do a lot of the work for you. Your driver logic should use the factory to create all the transformers and store them.
Then you need some logic and mechanism to "lookup" each XML file received to a given Transformer - perhaps each XML file has a header that you could use to identify or something similar. Again, you want to keep these decoupled from your main logic so that you can easily add new transformers without modification of the driver routine. You could e.g. supply the XML file to each transformer and ask it "can you transform this file", and it is up to each transformer to "take responsibility" for a given file.
Every time your driver routine gets a new XML file, it looks up the appropriate transformer, and runs it through; the result gets sent to the DB processing area. If no transformer can be found, you dump the file in a directory for interrogation later.
I would recommend reading a book like Agile Principles, Patterns and Practices by Robert Martin (http://www.amazon.co.uk/Agile-Principles-Patterns-Practices-C/dp/0131857258), which gives good examples of appropriate design patterns for situations like yours e.g. Factory and DIP etc.
Hope that helps!

Solution proposed by InSane is likley the most straigh forward and definitely XML friendly approach.
If you looking for writing your own code to do conversion of different data formats than implementing multiple reader entities that would read data from each distinct format and transform to unified format, than your main code would work with this entities in unified way, i.e. by saving to the database.
Search for ETL - (Extract-Trandform-Load) to get more information - What model/pattern should I use for handling multiple data sources? , http://en.wikipedia.org/wiki/Extract,_transform,_load

Using XSLT as proposed in the currently most upvoted answer, is just moving the problem, from c# to xslt.
You are still changing the pieces that process the xml, and you are still exposed to how good/poor is the code structured / whether it is in c# or rules in the xslt.
Regardless if you keep it in c# or go xslt for those bits, the key is to separate the transformation of the xml you receive from the various companies into a unique format, whether that's an intermediate xml or a set of classes where you load the data you are processing.
Whatever you do avoid getting clever and trying to define your own generic transformation layer, if that's what you want Do use XSLT since that's what's for. If you go with c#, keep it simple with a transformation class for each company that implements the simplest interface.
On the c# way, keep any reuse you may have between the transformations to composition, don't even think of inheritance to do so ... this is one of the areas where it gets very ugly quickly if you go that way.

Have you considered BizTalk server?

Just playing the fence here and offering another solution for other readers.
The easiest way to get the data into your models within C# is to use XSLT to convert each companies data into a serialized form of your models. These are the basic steps I would take:
Create a complete model of all your data and use XmlSerializer to write out the model.
Create an XSLT that takes Company A's data and converts it into a valid serialized xml model of your data. Use the previously created XML file as a reference.
Use Deserialize on the new XML you just created. You will now have a reference to your model object containing all the data from the company.

XML Serialization - Efficient?

Hey everybody. I'm creating a catalog app where users add/download information on cars. This could result in hundreds, possibly thousands, of cars and their data (make, model, year, image etc...). Seeing as WP7 no database, I'm using XML. My question is, would it be efficient to store every object in a list, then serialize that entire list? When the user loads the app, the entire list is deserialized and every object is instantiated. Is there a better way of doing this? Thanks.
ps - I've come across DataContractSerializer, but not sure if I should use that since it seems to be WCF related (and I'm not using WCF in my app).

Just do it and see. Unless every aspect of this is totally new to you, it should take less time to prototype and test something like this than it would take to have a discussion about it on SO - especially since the end result of the SO discussion will probably be someone telling you to prototype and test it.
If it's too slow, then you can look at alternatives - using a different kind of serialization method, partially deserializing the objects at startup to get the UI up and running and then continuing the deserialization in the background, or whatever.

C# Deep Copying Tree Structures with BinaryFormatter

EDIT 1
Kent has alleviated my fears. However, I've found one exception to the rule. I wrote a method for the Node class that would traverse up through the hierarchy until it got to the root Node and return its hash code. The hash code's the same all the way down the line except for one single object. In a word, RAGE.
I'm at the end of writing my first (relatively) large C# application. However, I think I've found a major bug that I screwed up on.
My app parses an XML file and creates a hierarchy of objects, each inheriting from a Node class, with Lists of Children and a Node Parent reference.
I needed to be able to copy this structure over. The concept is the initial structure holds the default data, and you can get your own copy and modify it while you use it. So I used a generic DeepClone< T > extension method to do it with BinaryFormatter.
My question, although I feel I already know (and dread) what the answer, is does this sill leave me with the problem of reassigning the references of all those Parent and Child nodes?
Disclaimer: As I finish this, I'm realizing all the design mistakes I made and how they could have been avoided, including this one. In my defense, this semester at university will be the first time I take a data structures class. ;) I fully expect there to be some vital part of a tree that I failed to implement that would help solve this. >_<

No. The serialization process records the references between objects and the deserialization process restores those relationships.
EDIT: unless I misunderstood your question - it's not exactly clear what you meant. Once the client code has made changes to the cloned structure, it is up to you to incorporate those changes into the main data structure, if that's what you want to do.
You might want to take a look at IEditableObject for a more formalized way to handling this kind of thing. You may well continue to use serialization as your means of cloning the object prior to edits, but your interface will be more standardized.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.