XDocument: How to find an attribute value in nested xml - c#

Given the working code snippet below, I'm trying to get the target element inside RootChild2.
https://dotnetfiddle.net/eN4xV9
string str =
#"<?xml version=""1.0""?>
<!-- comment at the root level -->
<Root>
<RootChild1>
<Child>Content</Child>
<Child>Content</Child>
<Child>Content</Child>
</RootChild1>
<RootChild2>
<Child>Content</Child>
<Child key='target'>Content</Child>
<Child>Content</Child>
</RootChild2>
</Root>";
XDocument doc = XDocument.Parse(str);
foreach (XElement element in doc.Descendants("RootChild2"))
{
if (element.HasAttributes && element.Element("Child").Attribute("key").Value == "target")
Console.WriteLine("found it");
else
Console.WriteLine("not found");
}

This finds all 3 RootChild2/Child elements, then tests each in turn:
foreach (XElement child in doc.Descendants("RootChild2").Elements("Child"))
{
if ((string)child.Attribute("key") == "target")
Console.WriteLine("found it");
else
Console.WriteLine("not found");
}

There are three problems here:
You're checking if the RootChild2 element has any attributes - and it doesn't
You're only checking the first Child element under each RootChild2 element
You're assuming the attribute is present (by dereferencing the XAttribute)
Here's code which will find all the target elements within a RootChild2:
foreach (XElement element in doc.Descendants("RootChild2"))
{
var targets = element
.Elements("Child")
.Where(child => (string) child.Attribute("key") == "target")
.ToList();
Console.WriteLine($"Found {targets.Count} targets");
foreach (var target in targets)
{
Console.WriteLine($"Target content: {target.Value}");
}
}
Note that the cast of XAttribute to string is a simple way of avoiding null reference issues - because the result of the explicit conversion is null when the source is null. (That's a general pattern in LINQ to XML.)

You are accessing the RootChild2-Element itself through element inside your loop.
Take a look at the following version:
foreach (XElement element in doc.Descendants("RootChild2").Nodes())
{
if (element.HasAttributes && element.Attribute("key").Value == "target")
Console.WriteLine("found it");
else
Console.WriteLine("not found");
}
Now it loops through all nodes of RootChild2.

You should rewrite your foreach loop at little bit, add accessing of child Elements() collection in RootChild2 and check every element in enumeration. You also can check that key attribute is present in element to prevent a potential null reference exception
foreach (XElement element in doc.Descendants("RootChild2").Elements())
{
if (element.HasAttributes && element.Attribute("key")?.Value == "target")
Console.WriteLine("found it");
else
Console.WriteLine("not found");
}
Descendants(XName) returns only elements with matching names, therefore you are getting only one RootChild2 element as the result in your code

Related

How to check if first XML node exist?

I have the following XML:
<Root><Node1>node1 value</Node1><Node2>node2 value</Node2></Root>
I'd like to check if Root is the first node. If so, I then want to get the values for the to child nodes.
This XML is inside of an XElement. I've tried this:
xml.Element("Root")
but that returns null. If Root exist, shouldn't it return a non null value?
string xml = #"<Root><Node1>node1 value</Node1><Node2>node2 value</Node2></Root>";
XDocument doc = XDocument.Parse(xml);
var root = doc.Root;
if(root.Name == "Root")
{
foreach(var el in root.Descendants())
{
string nodeValue = el.Value;
}
}
You can check the name of the Root element from Root.Name. After that loop all elements in the root using doc.Root.Descendants().
Since xml is an instance of XElement, it already references the root element, which in this case named Root. Doing xml.Element("Root") will return result only if the <Root> element has another child <Root>.
I'd like to check if Root is the first node.
You can simply check the Name of the root element :
var raw = "<Root><Node1>node1 value</Node1><Node2>node2 value</Node2></Root>";
var xml = XElement.Parse(raw);
if (xml.Name.ToString() == "Root")
Console.WriteLine("Success");
else
Console.WriteLine("Fail");
Try this solution
XDocument doc = XDocument.Parse(#"<Root><Node1>node1value</Node1><Node2>node2value</Node2></Root>");
if(doc!=null)
{
if (doc.Root.Name.LocalName == "Root")
{
foreach (var i in doc.Descendants())
Console.WriteLine(i.Value);
}
}

Add null as value when element does not exist in XML file C#

In C# I'm using the following to get some elements from an XML file:
var TestCaseDescriptions = doc.SelectNodes("//testcase/htmlcomment");
This works fine and gets the correct information but when my testcase has no htmlcomment it won't add any entry in the XmlNodeList TestCaseDescriptions.
When there's not htmlcomment I would like to have the value "null" as string the TestCaseDescriptions. So in the end I would have an XMLNodeList like
htmlcomment
htmlcomment
null
htmlcomment
htmlcomment
Can anyone describe or create a sample how to make this happen?
var TestCaseDescriptions = doc.SelectNodes("//testcase/htmlcomment");
When there's not htmlcomment I would like to have the value "null" as string the TestCaseDescriptions.
Your problem comes from the fact that if there is no htmlcomment, the number of selected nodes will be one less. The current answer shows what to do when the htmlcomment element is present, but empty, but I think you need this instead, if indeed the whole htmlcomment element is empty:
var testCases = doc.SelectNodes("//testcase");
foreach (XmlElement element in testCases)
{
var description = element.SelectSingleNode("child::htmlcomment");
string results = description == null ? "null" : description.Value;
}​
In above code, you go over each test case, and select the child node htmlcomment of the test case. If not found, SelectSingleNode returns null, so the last line checks for the result and returns "null" in that case, or the node's value otherwise.
To change this result into a node, you will have to create the node as a child to the current node. You said you want an XmlNodeList, so perhaps this works for you:
var testCaseDescriptions = doc.SelectNodes("//testcase");
foreach (XmlElement element in testCaseDescriptions)
{
var comment = element.SelectSingleNode("child::htmlcomment");
if (comment == null)
{
element.AppendChild(
doc.CreateElement("htmlcomment")
.AppendChild(doc.CreateTextNode("none")));
}
}​
After this, the node set is updated.
Note: apparently, the OP mentions that element.SelectSingleNode("child::htmlcomment"); does not work, but element.SelectSingleNode("./htmlcomment"); does, even though technically, these are equal expressions from the point of XPath, and should work according to Microsoft's documentation.
Try this
XmlDocument doc = new XmlDocument();
var TestCaseDescriptions = doc.SelectNodes("//testcase/htmlcomment");
foreach (XmlElement element in TestCaseDescriptions)
{
string results = element.Value == null ? "" : element.Value;
}​

How can I remove duplicate, invalid, child nodes from an XML document using Linq to XML?

I'm creating XML from JSON retrieved from an HttpWebRequest call, using JsonConvert. The JSON I'm getting back sometimes has duplicate nodes, creating duplicate nodes in the XML after conversion, which I then have to remove.
The processing of the JSON to XML conversion is being done in a generic service call wrapper that has no knowledge of the underlying data structure and so can't do any XPath queries based on a named node. The duplicates could be at any level within the XML.
I've got to the stage where I have a list of the names of duplicate nodes at each level but am not sure of the Linq query to use this to remove all but the first node with that name.
My code:
protected virtual void RemoveDuplicateChildren(XmlNode node)
{
if (node.NodeType != XmlNodeType.Element || !node.HasChildNodes)
{
return;
}
var xNode = XElement.Load(node.CreateNavigator().ReadSubtree());
var duplicateNames = new List<string>();
foreach (XmlNode child in node.ChildNodes)
{
var isBottom = this.IsBottomElement(child); // Has no XmlNodeType.Element type children
if (!isBottom)
{
this.RemoveDuplicateChildren(child);
}
else
{
var count = xNode.Elements(child.Name).Count();
if (count > 1 && !duplicateNames.Contains(child.Name))
{
duplicateNames.Add(child.Name);
}
}
}
if (duplicateNames.Count > 0)
{
foreach (var duplicate in duplicateNames)
{
xNode.Elements(duplicate).SelectMany(d => d.Skip(1)).Remove();
}
}
}
The final line of code obviously isn't correct but I can't find an example of how to rework it to retrieve and then remove all but the first matching element.
UPDATE:
I have found two ways of doing this now, one using the XElement and one the XmlNode, but neither actually removes the nodes.
Method 1:-
foreach (var duplicate in duplicateNames)
{
xNode.Elements(duplicate).Skip(1).Remove();
}
Method 2:-
foreach (var duplicate in duplicateNames)
{
var nodeList = node.SelectNodes(duplicate);
if (nodeList.Count > 1)
{
for (int i=1; i<nodeList.Count; i++)
{
node.RemoveChild(nodeList[i]);
}
}
}
What am I missing?
If you don't want any duplicate names: (assuming no namespaces)
XElement root = XElement.Load(file); // .Parse(string)
List<string> names = root.Descendants().Distinct(x => x.Name.LocalName).ToList();
names.ForEach(name => root.Descendants(name).Skip(1).Remove());
root.Save(file); // or root.ToString()
You might try to solve the problem at the wrong level. In XML is perfectly valid to have multiple nodes with the same name. JSON structures with duplicate property names should be invalid. You should try to do this sanitation at the JSON level and not after it was already transformed to XML.
For the xml cleanup this might be a starting point:
foreach (XmlNode child
in node.ChildNodes.Distinct(custom comparer that looks on node names))
{
.....
}

How to write xpath for this XDocument?

<tags>
<data mode="add" name="ttt" oldindex="-1" index="-1" oldnumber="1" number="1" type="VAR_INT" value="72" />
<data mode="delete" name="test3d" oldindex="-1" index="-1" oldnumber="1" number="1" type="VAR_INT" value="72" />
</tags>
I want to check whether "mode" is present in xml or not
xdDiffData.XPathSelectElement("//tags[#mode='add']") != null && xdDiffData.XPathSelectElement("//tags[#mode='delete']") != null
This always gives false..how to do this... ?
If you want to make sure that mode attribute is present in every data element, then you should better iterate all the data elements to look for the mode attribute this way:
XDocument doc = XDocument.Load("XmlFile.xml");
var nodes = doc.Descendants("data");
foreach (var node in nodes)
{
var attrMode = node.Attribute("mode");
if (attrMode == null)
{
// mode attribute is not available for this data element
}
}
Using Linq:
if (nodes.Where(c => c.Attribute("mode") == null).Count() == 0)
{
var result = nodes.All(e =>
e.Attribute("mode").Value.Equals("add") ||
e.Attribute("mode").Value.Equals("delete"));
}
else
{
// 'mode' attribute is missing for one or more 'data' element(s)
}
If result equals to true, then it means all the data elements have mode attribute either set to value "add" or "delete".
You are missing the 'data' element. Try
xdDiffData.XPathSelectElement("//tags/data[#mode='add']") != null && xdDiffData.XPathSelectElement("//tags/data[#mode='delete']") != null
xdDiffData.XPathSelectElement("/tags/data[#mode='add']") != null
I want to check whether "mode" is present in xml or not
Use:
//#mode
if this XPath expression selects a node, this means that an attribute named mode is present in the XML document.
or you could use:
boolean(//#mode)
and this produces a boolean value -- true() or false()

What is the difference between Node.SelectNodes(/*) and Node.childNodes?

string XML1 = "<Root><InsertHere></InsertHere></Root>";
string XML2 = "<Root><child1><childnodes>data</childnodes><childnodes>data1</childnodes></child1><child2><childnodes>data</childnodes><childnodes>data1</childnodes></child2></Root>";
Among below mentioned two code samples.. usage of childNodes doesn't copy all the child nodes from XML2. only <child1> is being copied.
string strXpath = "/Root/InsertHere";
XmlDocument xdxmlChildDoc = new XmlDocument();
XmlDocument ParentDoc = new XmlDocument();
ParentDoc.LoadXml(XML1);
xdxmlChildDoc.LoadXml(XML2);
XmlNode xnNewNode = ParentDoc.ImportNode(xdxmlChildDoc.DocumentElement.SelectSingleNode("/Root"), true);
if (xnNewNode != null)
{
XmlNodeList xnChildNodes = xnNewNode.SelectNodes("/*");
if (xnChildNodes != null)
{
foreach (XmlNode xnNode in xnChildNodes)
{
if (xnNode != null)
{
ParentDoc.DocumentElement.SelectSingleNode(strXpath).AppendChild(xnNode);
}
}
}
}
code2:
if (xnNewNode != null)
{
XmlNodeList xnChildNodes = xnNewNode.ChildNodes;
if (xnChildNodes != null)
{
foreach (XmlNode xnNode in xnChildNodes)
{
if (xnNode != null)
{
ParentDoc.DocumentElement.SelectSingleNode(strXpath).AppendChild(xnNode);
}
}
}
}
ParentDoc.OuterXML after executing first sample of code:
<Root>
<InsertHere>
<child1>
<childnodes>data</childnodes>
<childnodes>data1</childnodes>
</child1>
<child2>
<childnodes>data</childnodes>
<childnodes>data1</childnodes>
</child2>
</InsertHere>
</Root>
ParentDoc.OuterXML after executing second sample of Code
<Root>
<InsertHere>
<child1>
<childnodes>data</childnodes>
<childnodes>data1</childnodes>
</child1>
</InsertHere>
</Root>
I have done some debugging of the code, and it shows that xnNewNode.ChildNodes initially also returns 2 child nodes. After one iteration in the loop, the first child is however removed from ChildNodes, and therefore the loop ends prematurely.
If you want to use the ChildNodes property, one workaround is to "transfer" the child node references to an array or list, like this:
var xnChildNodes = xnNewNode.ChildNodes.Cast<XmlNode>().ToArray();
UPDATE
As Tomer W pointed out in his answer, when using XmlNode.AppendChild the inserted node is also removed from its original location. As stated in the MSDN documentation:
If the newChild is already in the tree, it is removed from
its original position and added to its target position.
With SelectNodes you have already created a new node collection, but with ChildNodes you are accessing the original collection.
this is a clearing of what Anders G posted, with more through explanation.
I am surprised the foreach does not fail (Throw Exception) in this situation, but hell.
In code1.
1. Create a NEW COLLECTION of nodes
2. Select nodes to it
3. append to other node => removing from original collection, but not the newly created one.
4 you are removing the node you are adding from the newly collection.
in Code2
1. Reference the ORIGINAL node collection
{child1, child2}
2. append 1st Node away to another collection => removing it from the original collection
{child2}
3. now when the foreach at index 1, it see that it passed the end of the collection. and exit.
this happens a lot when changing a collection that is subject to iteration.
but most the time, the IEnumerator is throwing an Exception when such happens.
hope i made it all clear
I had the same problem and observed, that whitespace nodes seem to have a value attached to the node, which is not the case with other nodes (at least in my application).This method removes the whitespace nodes from the node.ChildNodes list:
private List<XmlNode> findChildnodes(XmlNode node)
{
List<XmlNode> result = new List<XmlNode>();
foreach (XmlNode childnode in node.ChildNodes)
{
if(childnode.Value == null)
{
result.Add(childnode);
}
}
return result;
}
In answer to your question, Node.childNodes is All of the child nodes, whereas Node.SelectNodes(/*) is all of the child nodes that match /*. Only XML elements will match /*, so any attributes, CDATA nodes, text nodes, etc will be excluded.
Nevertheless, the problem occurs because you are changing the collection of nodes while while iterating over them. You cannot do that. The select nodes method returns a list of references to nodes. This is why is works.

Categories