This question already has answers here:
Obtaining InnerText of just the current node with XmlNode
(4 answers)
Closed 5 years ago.
I am trying to parse a XML file to get the content of each node (if it's not empty). However I got a problem, I got two times the same value :
To let you understand here is my XML part:
<para>
<emphasis role="bold">foobar</emphasis>
</para>
When I get the innerText of <para> it gives me "foobar" and when I get the content of <emphasis> it gives me foobar too.
I am using C# in this way
//[foreach loop ...]
if (node.Name == "para" || node.Name == "emphasis" )
{
if (!String.IsNullOrWhiteSpace(subNode.InnerText))
{
Debug.WriteLine(node.Name+ " - " + node.InnerText);
}
}
How to get only the content of the node asked and not all the text located in its subnodes.
Thanks
The InnerText property of a node with subnodes is always each subnode's InnerText properties concatenated and itself. That's not what you want.
<para>
<emphasis role="bold">foobar</emphasis>
<subject role="bold">Barbar</subject>
</para>
Changed your xml a bit, maybe you'll want something like this:
XmlNode node = doc.DocumentElement.SelectSingleNode("/para");
Console.WriteLine(node.Name);
foreach (XmlNode n in node.ChildNodes)
{
if (n.Name == "para" || n.Name == "emphasis" || n.Name == "subject")
{
if (!String.IsNullOrWhiteSpace(n.InnerText))
{
Console.WriteLine(n.Name + " - " + n.InnerText);
}
}
}
Then I got this:
para
emphasis - foobar
subject - Barbar
To sum it up you never get the InnerText from the Parent Node, just from it's children. And there's a bunch of different ways to do it too.
Hope this one helps!
Source: I just tested it on a Console App.
Obs: The doc object is the loaded xml document btw.
Related
I close my last question as it was commented that not enough research had been done. The more I research them more confused I am getting. What I think should work in my understanding and from post here and elsewhere is not working.
XML sample
<?xml version="1.0" encoding="UTF-8" ?>
<multistatus xmlns="DAV:">
<schmeata>
<classschema name="Space">
<base_class>Space</base_class>
</classschema>
<classschema name="Tapestry">
<base_class>File</base_class>
</classschema>
<classschema name="Document">
<base_class>File</base_class>
</classschema>
</schmeata>
</multistatus>
I am trying to get the name attribute of the classschema nodes that have base_class children with the 'File' value. so the result should be 'Tapestry' and 'Document'
I can easily return all classschemata nodes with
foreach (XmlNode node in xmlDoc.SelectNodes("//DAV:schemata/DAV:classschema", nsmgr))
{
strNode = node.Attributes["name"].Value;
responseString += strNode +" ";
}
return responseString;
And I can get the base_class value = to 'File' by looping through all the base_class nodes like this.
foreach (XmlNode node in xmlDoc.SelectNodes("//DAV:schemata/DAV:classschema/DAV:base_class", nsmgr))
{
if (node.InnerText == "File")
{
strNode = node.InnerText;
responseString += strNode +" ";
}
}
return responseString;
but if I try and filter or use axis to reference the parent node from the child I am failing.
An example of my filtering efforts are based at the SelectNodes method.
foreach (XmlNode node in xmlDoc.SelectNodes("//DAV:schemata/DAV:classschema[/DAV:base_class(contains,'File')]", nsmgr)) or
foreach (XmlNode node in xmlDoc.SelectNodes("//DAV:schemata/DAV:classschema[/DAV:base_class=='File']", nsmgr))
along with many, many other variations as examples I have seen is hard to tell if it is LINQ2XML or XDocument and mix in for PHP or other languages where aren't always specified I am now jumpled.
My next attempt will be SelectNodes("//DAV:schemata/DAV:classschemata[/DAV:baseclass(contains,'File')]"#name,nsmgr);
and variations on that.
I have thought from other examples that they had exactly what I wanted but when implemented did not work, for reasons I cannot explain.
This should give you the result you are after. Basically it looks for all classschema elements that have a first element with Value == "File" and then selects their name attributes Value. Please note I also used the string.Join method (pretty handy for stuff like this) to turn the result into a space delimited string which is what you need.
var xmlDoc = XDocument.Load("YourFile.xml");
var result = xmlDoc.Descendants("{DAV:}classschema")
.Where(x => x.Elements().First().Value == "File")
.Attributes("name")
.Select(x => x.Value);
string spaceDelimited = string.Join(" ", result);
This is what I am using that now works. I would swear on a stack of Bibles I had done this in earlier testing and it filed. But it does work. In posting the code I see that the inner loop I was using is now commented out and was the problem. I was looping something that did not need to be looped so was returning empty.
foreach (XmlNode node in xmlDoc.SelectNodes("//DAV:schemata/DAV:classschema[DAV:base_class='File']", nsmgr))
{
//strNode = node.Attributes["name"].Value;
//if (node.InnerText == "File")
// {
strNode = node.Attributes["name"].Value;
//strNode = node.InnerText;
responseString += strNode +" ";
// }
}
return responseString;
Thanks you for the help.
My next step will be to use each of the returned nodes and get all base_class elements and filter for only some of them to be returned. Not sure what I am looking for yet. Need to evaluate the XML to look for uniquenesss to capture what I want further.
Meaning now that I have the only classschema nodes with children elements containing 'File' get some of the child element siblings but not all. Challenge for another day.
In C# I'm using the following to get some elements from an XML file:
var TestCaseDescriptions = doc.SelectNodes("//testcase/htmlcomment");
This works fine and gets the correct information but when my testcase has no htmlcomment it won't add any entry in the XmlNodeList TestCaseDescriptions.
When there's not htmlcomment I would like to have the value "null" as string the TestCaseDescriptions. So in the end I would have an XMLNodeList like
htmlcomment
htmlcomment
null
htmlcomment
htmlcomment
Can anyone describe or create a sample how to make this happen?
var TestCaseDescriptions = doc.SelectNodes("//testcase/htmlcomment");
When there's not htmlcomment I would like to have the value "null" as string the TestCaseDescriptions.
Your problem comes from the fact that if there is no htmlcomment, the number of selected nodes will be one less. The current answer shows what to do when the htmlcomment element is present, but empty, but I think you need this instead, if indeed the whole htmlcomment element is empty:
var testCases = doc.SelectNodes("//testcase");
foreach (XmlElement element in testCases)
{
var description = element.SelectSingleNode("child::htmlcomment");
string results = description == null ? "null" : description.Value;
}
In above code, you go over each test case, and select the child node htmlcomment of the test case. If not found, SelectSingleNode returns null, so the last line checks for the result and returns "null" in that case, or the node's value otherwise.
To change this result into a node, you will have to create the node as a child to the current node. You said you want an XmlNodeList, so perhaps this works for you:
var testCaseDescriptions = doc.SelectNodes("//testcase");
foreach (XmlElement element in testCaseDescriptions)
{
var comment = element.SelectSingleNode("child::htmlcomment");
if (comment == null)
{
element.AppendChild(
doc.CreateElement("htmlcomment")
.AppendChild(doc.CreateTextNode("none")));
}
}
After this, the node set is updated.
Note: apparently, the OP mentions that element.SelectSingleNode("child::htmlcomment"); does not work, but element.SelectSingleNode("./htmlcomment"); does, even though technically, these are equal expressions from the point of XPath, and should work according to Microsoft's documentation.
Try this
XmlDocument doc = new XmlDocument();
var TestCaseDescriptions = doc.SelectNodes("//testcase/htmlcomment");
foreach (XmlElement element in TestCaseDescriptions)
{
string results = element.Value == null ? "" : element.Value;
}
Here's some fantastic example XML:
<root>
<section>Here is some text<mightbe>a tag</mightbe>might <not attribute="be" />. Things are just<label>a mess</label>but I have to parse it because that's what needs to be done and I can't <font stupid="true">control</font> the source. <p>Why are there p tags here?</p>Who knows, but there may or may not be spaces around them so that's awesome. The point here is, there's node soup inside the section node and no definition for the document.</section>
</root>
I'd like to just grab the text from the section node and all sub nodes as strings. BUT, note that there may or may not be spaces around the sub-nodes, so I want to pad the sub notes and append a space.
Here's a more precise example of what input might look like, and what I'd like output to be:
<root>
<sample>A good story is the<book>Hitchhikers Guide to the Galaxy</book>. It was published<date>a long time ago</date>. I usually read at<time>9pm</time>.</sample>
</root>
I'd like the output to be:
A good story is the Hitchhikers Guide to the Galaxy. It was published a long time ago. I usually read at 9pm.
Note that the child nodes don't have spaces around them, so I need to pad them otherwise the words run together.
I was attempting to use this sample code:
XDocument doc = XDocument.Parse(xml);
foreach(var node in doc.Root.Elements("section"))
{
output += String.Join(" ", node.Nodes().Select(x => x.ToString()).ToArray()) + " ";
}
But the output includes the child tags, and is not going to work out.
Any suggestions here?
TL;DR: Was given node soup xml and want to stringify it with padding around child nodes.
Incase you have nested tags to an unknown level (e.g <date>a <i>long</i> time ago</date>), you might also want to recurse so that the formatting is applied consistently throughout. For example..
private static string Parse(XElement root)
{
return root
.Nodes()
.Select(a => a.NodeType == XmlNodeType.Text ? ((XText)a).Value : Parse((XElement)a))
.Aggregate((a, b) => String.Concat(a.Trim(), b.StartsWith(".") ? String.Empty : " ", b.Trim()));
}
You could try using xpath to extract what you need
var docNav = new XPathDocument(xml);
// Create a navigator to query with XPath.
var nav = docNav.CreateNavigator();
// Find the text of every element under the root node
var expression = "/root//*/text()";
// Execute the XPath expression
var resultString = nav.evaluate(expression);
// Do some stuff with resultString
....
References:
Querying XML, XPath syntax
Here is a possible solution following your initial code:
private string extractSectionContents(XElement section)
{
string output = "";
foreach(var node in section.Nodes())
{
if(node.NodeType == System.Xml.XmlNodeType.Text)
{
output += string.Format("{0}", node);
}
else if(node.NodeType == System.Xml.XmlNodeType.Element)
{
output += string.Format(" {0} ", ((XElement)node).Value);
}
}
return output;
}
A problem with your logic is that periods will be preceded by a space when placed right after an element.
You are looking at "mixed content" nodes. There is nothing particularly special about them - just get all child nodes (text nodes are nodes too) and join they values with space.
Something like
var result = String.Join("",
root.Nodes().Select(x => x is XText ? ((XText)x).Value : ((XElement)x).Value));
//Get and translate interface configs
var interfaces = CurrentXML
.Descendants("interface-list")
.Elements("interface")
.Select(i => new { NAMEIF = i.Element("name").Value ,
DESC = i.Element("description").Value ,
NOSHUT = i.Element("if-item-list")
.Element("item")
.Element("physical-if")
.Element("enabled")
.Value
}
) ;
//Build ASA Configuration and display to user.
ASAconfig.Append( "<br />" + deviceconf.HOSTNAME.ToString() ) ;
foreach ( var el in interfaces )
{
ASAconfig.Append(
string.Format("<br />nameif {0}<br /> description {1}<br /> {2}" ,
el.NAMEIF != null ? el.NAMEIF.ToString() : string.Empty ,
el.DESC != null ? el.DESC.ToString() : string.Empty ,
el.NOSHUT.ToString() == "1" ? "no shut" : string.Empty
)
) ;
}
I'm sorry if this was not formatted correctly, this is my first post.
I am creating a website with ASP.NET and C# to parse an XML file and translate certain element values and append to an arbitrary string. The problem I'm having is that there are "interface" elements within the XML file which do not contain an "enabled" descendant (would be element of "physical-if" that also does not exist for virtual interfaces).
I don't want to perform the selections from the XML file if this descendant does not exist, and as you can see I've played with a "where" clause but have struck out thus far. Part of an example XML file is pasted below showing the difference that I'm talking about. Any advice is greatly appreciated.
Thank you for your time.
<interface>
<name>SSL-VPN</name>
<description>SSL VPN</description>
<property>2</property>
<if-item-list>
<item>
<item-type>5</item-type>
<sslvpn>SSL-VPN</sslvpn>
</item>
</if-item-list>
</interface>
<interface>
<name>DMZ</name>
<description>DMZ</description>
<property>0</property>
<if-item-list>
<item>
<item-type>1</item-type>
<physical-if>
<if-num>2</if-num>
<enabled>1</enabled>
<if-property>3</if-property>
<ip>10.21.2.1</ip>
<netmask>255.255.0.0</netmask>
<mtu>1500</mtu>
<auto-negotiation>1</auto-negotiation>
<link-speed>100</link-speed>
<mac-address-enable>0</mac-address-enable>
<mac-address />
<full-duplex>1</full-duplex>
<secondary-ip-list />
<anti-spoof>2</anti-spoof>
<anti-scan>0</anti-scan>
<block-notification>0</block-notification>
<dos-prevention>1</dos-prevention>
<intra-inspection>0</intra-inspection>
<dhcp-server>
<server-type>0</server-type>
</dhcp-server>
<vpn-df-bit>0</vpn-df-bit>
<qos>
<max-link-bandwidth>0</max-link-bandwidth>
<qos-marking>
<marking-field>2</marking-field>
<marking-method>
<marking-type>0</marking-type>
</marking-method>
<priority-method>0</priority-method>
</qos-marking>
</qos>
<static-mac-ip-binds>
<restrict-traffic>0</restrict-traffic>
</static-mac-ip-binds>
<static-mac-acl>
<enable>0</enable>
</static-mac-acl>
</physical-if>
</item>
</if-item-list>
...
Try adding this Where:
.Elements("interface")
.Where(i => null != i.Descendants("enabled").FirstOrDefault())
.Select // ...
There's nothing magical about Linq (outside of the fact that it's the current silver bullet du jour). It would seem to me that what you want to do would be easier and possibly clearer if you used XPath to select your element set.
Can XPath return only nodes that have a child of X?
XPath find all elements with specific child node
Using XPath, it's as simple as:
XmlDocument xmldoc = new XmlDocument() ;
xmldoc.LoadXml(rawXml) ; // load your XML string here
XmlNodeList selectedNodes = xmldoc.SelectNodes( "/interface-list/interface[if-item-list/item/physical-if/enabled]" ) ;
It should be noted that an XmlNodeList is IEnumerable rather than IEnumerable<T>, so to do any sort of Linq magick, you'll need to cast it like so:
selectedNodes
.Cast<XmlNode>()
.Select( ... )
;
I am attempting to use XML for some simple formatting and embedded links. I'm trying to parse the XML using Linq to Xml, but I'm struggling with parsing a text "Value" with embedded elements in it. For example, this might be a piece of XML I want to parse:
<description>A plain <link ID="1">table</link> with a green hat on it.</description>
Essentially, I want to enumerate through the "Runs" in the Value of the description node. In the above example, there would be a text node with a value of "A plain ", followed by a "link" element, whose value is "table", followed by another text node whose value is " with the green hat on it.".
How do I do this? I tried enumerating the root XElement's Elements() enumeration, but that only returned the link element, as did Descendants(). DescendantNodes() did return all the nodes, but it also returned the subnodes of the link elements. In this case, a text node containing "table", in addition to the element that contained it.
You'll need to access the Nodes() method, check the XmlNodeType, and cast as appropriate to access each object's properties and methods.
For example:
var xml = XElement.Parse(#"<description>A plain <link ID=""1"">table</link> with a green hat on it.</description>");
foreach (var node in xml.Nodes())
{
Console.WriteLine("Type: " + node.NodeType);
Console.WriteLine("Object: " + node);
if (node.NodeType == XmlNodeType.Element)
{
var e = (XElement)node;
Console.WriteLine("Name: " + e.Name);
Console.WriteLine("Value: " + e.Value);
}
else if (node.NodeType == XmlNodeType.Text)
{
var t = (XText)node;
Console.WriteLine(t.Value);
}
Console.WriteLine();
}
XElement.Nodes() will enumerate only the top level child nodes.
Just use the Nodes() method on your description element.
var xmlStr = #"<description>A plain <link ID=""1"">table</link> with a green hat on it.</description>";
var descriptionElement = XElement.Parse(xmlStr);
var nodes = descriptionElement.Nodes();
foreach (var node in nodes)
Console.WriteLine("{0}\t\"{1}\"", node.NodeType, node);
Yields:
Text "A plain "
Element "<link ID="1">table</link>"
Text " with a green hat on it."