XML comparer C# - c#

I want to compare 2 XML files.
Looks easy if both of them have an identical structure. But not in my case :(
My files looks like:
<root>
<t>
<child1>
<cc1>val</cc1>
<cc2>val</cc2>
......
</child1>
<child2>
<cc1>val</cc1>
<cc2>val</cc2>
......
</child2>
<child2>
<cc1>val</cc1>
<cc2>val</cc2>
......
</child2>
.......
<child3>
<cc1>val</cc1>
<cc2>val</cc2>
......
</child3>
....
</t>
<t>
...
</t>
.....
</root>
And they could have any numbers of childes, and childes of childes...
The task is
To compare only one defined block. I need search it for value of 1st child's child (child1.cc1.value in this example)
During the comparetion some nodes could be skipped (the names of skipped nodes stored somewhere, for example, in strings array)
It is possible to have multiple identical nodes like . And if child2 isn't ignored, then I need to make sure they are the same amount, and they all coincide with the corresponding second file. So there could be next situation:
1st file contains:
<child2><cc1>1</cc1>...</child2>
<child2><cc1>3</cc1>...</child2>
<child2><cc1>2</cc1>...</child2>
2st file contains:
<child2><cc1>2</cc1>...</child2>
<child2><cc1>1</cc1>...</child2>
<child2><cc1>3</cc1>...</child2>
And that means they are corresponds each other.
So they could be in the random order.
Now I can't make a decision how to realize this algorithm. I suggested to use DataSet objects, but this XML-structure looks too difficult for simply using DataTables, dataRows and etc..
Now I'm trying XmlNodes. But I haven't realized that part where I have several identical nodes with different data in random order.
Any ideas?

How large are your XML files? And how complex is the structure in reality?
If not too large or complex then I would recommend parsing the whole file into a class structure and then performing your validation on the properties of the classes. For example (pseudocode)...
xmlClass file1 = new xmlClass(file1info);
xmlClass file2 = new xmlClass(file2info);
//Custom classes have now parsed XML files in whichever way you like
if (file1.numberOfChildren != file2.numberOfChildren)
{
//comparison fail
}
elseif (!file1.orderOfChildrenSame(file2))
{
//comparison fail
}
else
{
//comparison success
}
Obviously the exact implementation of the methods and properties of your xmlClass will depend on your exact requirements.
XmlClass may be of the rough layout...
using System;
using System.Collections.Generic;
using System.Xml;
public class XmlClass
{
private XmlDocument _xmlDoc;
private List<ChildClass> _children As New List<ChildClass>();
public XmlClass(FileInfo fil){
_xmlDoc = New XmlDocument();
_xmlDoc.Load(fil.FullName);
ParseChildren();
_xmlDoc = Nothing;
}
private void ParseChildren(){
XmlNodeList ndl = _xmlDoc.SelectNodes("/root/t") //select all <t>s
foreach (xmlNode nodT in ndl.Nodes){
foreach (xmlNode nodChild in nodT.ChildNodes()){
_children.Add(new ChildClass(nodChild));
}
}
// Now _children contains all child nodes of <t>s and can be worked with logically
}
public int numberOfChildren
{
get {return _children.Count();}
}
}
You will obviously need to implement ChildClass - which may in turn contain a collection of ChildClass itself (allowing the hierarchy you describe). You will also need to implement the other validation methods as you require. Also you may need to implement other classes to represent other node types within the document which you are interested in.
Don't parse more than you need to in order to validate! - It depends what your end goal is.
PS
I would also suggest that this XML format is not very "nice" in terms of the <child1>, <child2> set-up. It would be much more XMLesque to have <child id="1">, <child id="2"> etc. As presumably <child1> and <child2> are essentially the same type of node...

Related

How does one increment/update integer value of an XML Attribute in C# (XAttribute)?

TL;DR: If I have an XAttribute of NumFruits in an XElement, how can I increment/update the value from 0 to 1,2,3... ?
The issue:
When I attempt to increment the XAttribute like so:
basket.Attribute("numFruits").Value += 1
the result for numFruits will be numFruits = 01 (since 0 was the initial value), when the intended result was supposed to be numFruits = 1
Global variables that are added at the end of the parsing is not desired as there can be many baskets.
Explanation:
In C# Linq to XML, one can add XAttributes to an XElement like so.
XElement basket = new XElement("Marys_Basket", new XAttribute("NumFruits", 0);
where in the example we use NumFruits XAttribute as a counter to keep track of number of fruits in the XDocument.
As I interate through a list of (for example) Fruit objects that each also have a basket_owner property, I serialize all those objects to XML manually by creating or adding to XElements which in this example would be the owners.
As the list of fruits is not fixed, I have to add Fruit elements to the XElement and update the XAttribute by first checking if the owner element exists (I've done this with LINQ queries and checking if they are null) and then adding the Fruit XElement as a child, yielding something like so:
<Root>
<Marys_basket numFruits=2>
<Fruit name="Mango"/>
<Fruit name="Papaya"/>
</Marys_basket>
<Jons_basket numFruits=0 />
<Bobs_basket numFruits=1>
<Fruit name="Apple"/>
</Bobs_basket>
</Root>
Here's a related question on how to increment an XML Element (in this case XElement), but not an XAttribute. And this as well but not specifically to increasing a value.
I've found one method (posted as an answer) and would like to explore a more robust way to do so. As my program does this multiple times.
Would be even shorter if you cast the XAttribute to int directly :
basket.FirstAttribute.SetValue((int)basket.FirstAttribute + 1);
Just like XElement, XAttribute also has some explicit conversion operators predefined.
Working example * :
using System;
using System.Xml.Linq;
using System.Xml;
public class Program
{
public static void Main()
{
var xml = #"<Root>
<Marys_basket numFruits=""2"">
<Fruit name=""Mango""/>
<Fruit name=""Papaya""/>
</Marys_basket>
<Jons_basket numFruits=""0"" />
<Bobs_basket numFruits=""1"">
<Fruit name=""Apple""/>
</Bobs_basket>
</Root>";
var doc = XDocument.Parse(xml);
XElement basket = doc.Root.Element("Marys_basket");
basket.FirstAttribute.SetValue((int)basket.FirstAttribute + 1);
Console.WriteLine(doc.ToString());
}
}
*: Mainly for future visitor, as I believe OP already know about the rest
The shortest way of doing it I've found so far:
basket.FirstAttribute.SetValue( Int32.Parse( basket.FirstAttribute.Value ) + 1);
Note that basket.Attribute("numFruits") can also be used.
What that does is grab the attribute we want and set the value by first parsing the existing value as an Integer and then increment the value by 1. This is because values set as XAttributes are saved/retrieved as strings.
The reason that doing basket.Attribute("numFruits") += 1 yields 01 instead of 1 or 11 instead of 2 when attempting to increment is that Attribute values are stored as strings, and doing a += operation becomes a string concatenation object.

deserialize xml node with unknown child nodes

I have some xml that looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<video key="8bJ8OyXI">
<custom>
<legacyID>50898311001</legacyID>
</custom>
<date>1258497567</date>
<description>some description</description>
<duration>486.20</duration>
<md5>bc89fde37ef103db26b8a9d98065d006</md5>
<mediatype>video</mediatype>
<size>99416259</size>
<sourcetype>file</sourcetype>
<status>ready</status>
<views>0</views>
</video>
</response>
I am using XmlSerializer to serialize the xml to class objects, and would prefer to stick with it if possible since everything else works just fine. The node custom is just custom metadata added to the video, and pretty much anything could potentially end up in there (only strings, just a name and value). I used xsd.exe to generate class objects from my xml, which generated a unique class for the <custom> tag with just one ulong property for the legacyID value. The thing is, potentially any arbitrary number of values could be there and I can't and don't need to account for them all (but I may need to read particular values later).
Is it possible to set up the Video.Custom property in my class so that the serializer can deserialize those values into say something like a Dictionary<string, string>? I don't need type information for those particular values, saving the node names + values are more than enough for my purposes.
You can handle UnknownElement event and there deserialize custom element to your dictionary
serializer.UnknownElement += (s, e) =>
{
if (e.Element.LocalName == "custom" && e.ObjectBeingDeserialized is Video)
{
Video video = (Video)e.ObjectBeingDeserialized;
if (video.Custom == null)
{
video.Custom = new Dictionary<string, string>();
}
foreach (XmlElement element in e.Element.OfType<XmlElement>())
{
XmlText text = (XmlText)element.FirstChild;
video.Custom.Add(element.LocalName, text.Value);
}
}
};

How to modify XML file in c#?

<Customers>
<Customer1>
<Name>Bobby</Name>
<Age>21</Age>
<Address>Panjim</Address>
</Customer1>
<Customer2>
<Name>Peter</Name>
<Age>32</Age>
<Address>Panjim</Address>
</Customer2>
<Customer4>
<Name>Joel</Name>
<Age>32</Age>
<Address>Mapusa</Address>
</Customer4>
</Customers>
So the thing is I want to delete a particular element and when i delete the first element i.e customer1, I want to update the other elements. I mean I want to make customer3, customer2 and customer2, customer1.
Can anyone please help me achieve this?
What about:
class Program {
static void Main(string[ ] args) {
XDocument doc = XDocument.Load("D:\\file.xml"); //example file
doc.Root.SwitchAndRemove("Customer1");
doc.Save("D:\\file.xml");
}
}
public static class Utilities {
public static void SwitchAndRemove(this XElement customers, XName name) {
var x = customers.Descendants().Where(e => e.Name == name).Select((element, index) => new { element, index }).Single();
int count = 0;
XElement temp = x.element;
foreach (XElement el in customers.Nodes()) {
if (count == x.index + 1) {
temp.RemoveAll();
temp.Add(el.Descendants().ToArray());
temp = el;
}
else
count++;
}
temp.Remove();
}
}
By giving as input your xml the output is the following:
<?xml version="1.0" encoding="utf-8"?>
<Customers>
<Customer1>
<Name>Peter</Name>
<Age>32</Age>
<Address>Panjim</Address>
</Customer1>
<Customer2>
<Name>Joel</Name>
<Age>32</Age>
<Address>Mapusa</Address>
</Customer2>
</Customers>
I'd argue that your problem is not how you could rename your nodes with minimum effort but structure of your XML file.
You said order of customers is not important and apparently customer tag's number is not important, either, since you want to rename the tags upon deletion.
So maybe this structure just creates unnecessary complexity and extra work for you.
Only reason I see you could need the number in tag is to identify the node you are about to remove. Am I right or is there something more to it? If not then you could add random unique identifier (like Guid) to your customer data to remove the right one.
Could save you lot of trouble.
<customers>
<customer>
<guid>07fb-877c-...</guid>
<name>Notch</name>
<age>34</age>
<address>street</address>
</customer>
<customer>
<guid>1435-435a-...</guid>
<name>Sam</name>
<age>23</age>
<address>other</address>
</customer>
<customers>
Say the element you have to delete is Customer1, first of all you can read the complete xml file using one of the XML parsing classes available in c# like XDocument or XmlReader and write to another xml file say "Temp.xml" skipping the Customer1 element completely. This way we have achieved the deletion part.
Next to update, forget the file being XML file and read the entire file to a string, say "xmlstring". Now use the Replace function available with a string data type to replace "Customer2" with "Customer1" and then "Customer3" with "Customer2" and so on.
And now delete your original XML file and write the string "xmlstring" using a stream writer to a file name "YourFileName.xml"
Thats it. Hope this solution works for you. Try this and in case u are unable get this done, share the code which u tried and we shall suggest how to work it out.
taken from your comment that the order does not have to be preserved then you can do this
public static void RemoveCustomer(XElement customers, XElement removeThis){
var last = customeers.Elements().Last();
if(last != removeThis){
foreach(var element in removeThis.Elements()){
element.Value = last.Element(element.Name).Value;
}
}
last.Remove();
}
It effectively substitutes the one to be removed with the last (unless the last should be removed) and thereby eliminates the need for renaming any of the other elements

How do I turn a deserialized XML object (.NET) into a single collection of dot separated named key values?

To start, I am constrained to .NET 2.0 so LINQ is not an option for me (though I would be curious to see a LINQ solution as fodder for pushing to move to .NET 3.5 for the project if it is easy).
I have an XSD that is turned into a set of C# classes via xsd.exe at build time. At runtime, an XML file is loaded and deserialized into the C# classes (validation occurs at this time). I need to then turn that in-memory configuration object (including the default values that were populated during import of the XML file) into a dictionary of key value pairs.
I would like the dictionary key to be a dot separated path to the value. Attribute values and element text would be considered values, everything else along the way a key into that.
As an example, imagine the following XML file:
<rootNode>
<foo enabled="true"/>
<bar enabled="false" myAttribute="5.6">
<baz>Some Text</baz>
<baz>Some other text.</baz>
</bar>
</rootNode>
would turn into a dictionary with keys like:
"rootNode.foo.enabled" = (Boolean)true
"rootNode.bar.enabled" = (Boolean)false
"rootNode.bar.myAttribute" = (Float)5.6
"rootNode.bar.baz" = List<String> { "Some Text", "Some other text." }
Things of note are that rootNode is left off not because it is special but because it had no text or attributes. Also, the dictionary is a dictionary of objects which are typed appropriately (this is already done in deserialization, which is one of the reasons I would like to work with the C# object rather than the XML directly).
Interestingly, the objects created by xsd.exe are already really close to the form I want. The class names are things like rootNodeFoo with a float field on it called myAttribute.
One of the things I have considered but am not sure how to go about are using reflection to iterate over the object tree and using the names of the classes of each object to figure out the name of the node (I may have to tweak the casing a bit). The problem with this is that it feels like the wrong solution since I already have access to a deserializer that should be able to do all of that for me and much faster.
Another option would be using XSLT to serialize the data directly to a format that is how I want. The problem here is that my XSLT knowledge is limited and I believe (correct me if I am wrong) I will lose typing on the way (everything will be a string) so I will have to essentially deserialize once again by hand to get the types back out (and this time without XSD validation that I get when I use the .NET deserializer).
In case it matters, the calls I am using to get the configuration object populated from an XML file is something like this:
var rootNode = new XmlRootAttribute();
rootNode.ElementName = "rootNode";
rootNode.Namespace = "urn:myNamespace";
var serializer = new XmlSerializer(typeof(rootNode), rootNode);
using (var reader = new StringReader(xmlString))
{
var deserializedObject = (rootNode)serializer.Deserialize(reader);
}
First observation: using the object graph is not the best place to start to generate a dot representation. You're talking about nodes which have names and are in a well-defined hierarchy and you want to produce some kind of dot notation from it; the xml DOM seems to be the best place to do this.
There are a few problems with the way you describe the problem.
The first is in the strategy when it comes to handling multiple elements of the same name. You've dodged the problem in your example by making that dictionary value actually a list, but suppose your xml looked like this:
<rootNode>
<foo enabled="true">
<bar enabled="false" myAttribute="5.6" />
<bar enabled="true" myAttribute="3.4" />
</foo>
</rootNode>
Besides foo.enabled = (Boolean)true which should be fairly obvious, what dictionary keys do you propose for the two myAttribute leaves? Or would you have a single entry, foo.bar.myAttribute = List<float> {5.6, 3.4}? So, problem #1, there's no unambiguous way to deal with multiple similarly-named non-leaf nodes.
The second problem is in selecting a data type to do the final conversion at leaf nodes (i.e. attribute or element values). If you're writing to a Dictionary<string, object>, you will probably want to select a type based on the Schema simple type of the element/attribute being read. I don't know how to do that, but suggest looking up the various uses of the System.Convert class.
Assuming for the moment that problem #1 won't surface, and that you're ok with a Dictionary<string, string> implementation, here's some code to get you started:
static void Main(string[] args)
{
var xml = #"
<rootNode>
<foo enabled=""true"">
<bar enabled=""false"" myAttribute=""5.6"" />
<baz>Text!</baz>
</foo>
</rootNode>
";
var document = new XmlDocument();
document.LoadXml(xml);
var retVal = new Dictionary<string, string>();
Go(retVal, document.DocumentElement, new List<string>());
}
private static void Go(Dictionary<string, string> theDict, XmlElement start, List<string> keyTokens)
{
// Process simple content
var textNode = start.ChildNodes.OfType<XmlText>().SingleOrDefault();
if (textNode != null)
{
theDict[string.Join(".", keyTokens.ToArray())] = textNode.Value;
}
// Process attributes
foreach (XmlAttribute att in start.Attributes)
{
theDict[string.Join(".", keyTokens.ToArray()) + "." + att.Name] = att.Value;
}
// Process child nodes
foreach (var childNode in start.ChildNodes.OfType<XmlElement>())
{
Go(theDict, childNode, new List<string>(keyTokens) { childNode.Name }); // shorthand for .Add
}
}
And here's the result:
One approach would be to implement a customer formatter and slot it into the standard serialization pattern, create a class that implements IFormatter i.e. MyDotFormatter
http://msdn.microsoft.com/en-us/library/system.runtime.serialization.iformatter.aspx
then implement as below
Stream stream = File.Open(filename, FileMode.Create);
MyDotFormatter dotFormatter = new MyDotFormatter();
Console.WriteLine("Writing Object Information");
try
{
dotFormatter.Serialize(stream, objectToSerialize);
}
catch (SerializationException ex)
{
Console.WriteLine("Exception for Serialization data : " + ex.Message);
throw;
}
finally
{
stream.Close();
Console.WriteLine("successfully wrote object information");
}

LINQ multiple columns

<root>
<data1>
<Element1>Value</Element1>
<Element2>Value</Element2>
<Element3>Value</Element3>
</data1>
<data2>
<Element1>Value</Element1>
<Element2>Value</Element2>
</data2>
</root>
From the above XML I would like to make an XML looking like this:
<root>
<d1e1>value<d1e1>
<d1e2>value<d1e2>
<d2e1>value<d2e1>
</root>
What is the most efficient way to process that?
Foreach or Linq in theory Linq should be faster in most cases and speed is of the essence for this project
Any idea?
The idea was to just select X nodes out of a pool of Y and the example here is simplified to show you the problem. In general it is like that I have a multi level xml that I needed to flat out to only have one sublevel (aka root + level1) but from the source I only need to have certain elements that are of interest to me.
Anyway the issiue is solved cos I done it with foreach cos I found out that if you have an shema specified in the xml but not accessable LINQ dosent whant to work anyway.
the solution was like this:
I made a function:
public System.Xml.XmlElement GetSubElement(XmlElement Parent, string element)
{
System.Xml.XmlElement ret = null;
if (Parent == null)
return ret;
XmlNodeList ContentNodes = Parent.GetElementsByTagName(element);
if (ContentNodes.Count > 0)
{
XmlNode node = ContentNodes.Item(0);
ret = (XmlElement)node;
}
return ret;
}
I made a foreach loop on the area that was repeating
I got the elements that where out of the repeating context with the above function.
Anyway that solved it for me.
Edit:
Don't know how to get this code to appear properly cos Ctrl+K dosent seem to do it

Categories