I am attempting to use XML for some simple formatting and embedded links. I'm trying to parse the XML using Linq to Xml, but I'm struggling with parsing a text "Value" with embedded elements in it. For example, this might be a piece of XML I want to parse:
<description>A plain <link ID="1">table</link> with a green hat on it.</description>
Essentially, I want to enumerate through the "Runs" in the Value of the description node. In the above example, there would be a text node with a value of "A plain ", followed by a "link" element, whose value is "table", followed by another text node whose value is " with the green hat on it.".
How do I do this? I tried enumerating the root XElement's Elements() enumeration, but that only returned the link element, as did Descendants(). DescendantNodes() did return all the nodes, but it also returned the subnodes of the link elements. In this case, a text node containing "table", in addition to the element that contained it.
You'll need to access the Nodes() method, check the XmlNodeType, and cast as appropriate to access each object's properties and methods.
For example:
var xml = XElement.Parse(#"<description>A plain <link ID=""1"">table</link> with a green hat on it.</description>");
foreach (var node in xml.Nodes())
{
Console.WriteLine("Type: " + node.NodeType);
Console.WriteLine("Object: " + node);
if (node.NodeType == XmlNodeType.Element)
{
var e = (XElement)node;
Console.WriteLine("Name: " + e.Name);
Console.WriteLine("Value: " + e.Value);
}
else if (node.NodeType == XmlNodeType.Text)
{
var t = (XText)node;
Console.WriteLine(t.Value);
}
Console.WriteLine();
}
XElement.Nodes() will enumerate only the top level child nodes.
Just use the Nodes() method on your description element.
var xmlStr = #"<description>A plain <link ID=""1"">table</link> with a green hat on it.</description>";
var descriptionElement = XElement.Parse(xmlStr);
var nodes = descriptionElement.Nodes();
foreach (var node in nodes)
Console.WriteLine("{0}\t\"{1}\"", node.NodeType, node);
Yields:
Text "A plain "
Element "<link ID="1">table</link>"
Text " with a green hat on it."
Related
This question already has answers here:
Obtaining InnerText of just the current node with XmlNode
(4 answers)
Closed 5 years ago.
I am trying to parse a XML file to get the content of each node (if it's not empty). However I got a problem, I got two times the same value :
To let you understand here is my XML part:
<para>
<emphasis role="bold">foobar</emphasis>
</para>
When I get the innerText of <para> it gives me "foobar" and when I get the content of <emphasis> it gives me foobar too.
I am using C# in this way
//[foreach loop ...]
if (node.Name == "para" || node.Name == "emphasis" )
{
if (!String.IsNullOrWhiteSpace(subNode.InnerText))
{
Debug.WriteLine(node.Name+ " - " + node.InnerText);
}
}
How to get only the content of the node asked and not all the text located in its subnodes.
Thanks
The InnerText property of a node with subnodes is always each subnode's InnerText properties concatenated and itself. That's not what you want.
<para>
<emphasis role="bold">foobar</emphasis>
<subject role="bold">Barbar</subject>
</para>
Changed your xml a bit, maybe you'll want something like this:
XmlNode node = doc.DocumentElement.SelectSingleNode("/para");
Console.WriteLine(node.Name);
foreach (XmlNode n in node.ChildNodes)
{
if (n.Name == "para" || n.Name == "emphasis" || n.Name == "subject")
{
if (!String.IsNullOrWhiteSpace(n.InnerText))
{
Console.WriteLine(n.Name + " - " + n.InnerText);
}
}
}
Then I got this:
para
emphasis - foobar
subject - Barbar
To sum it up you never get the InnerText from the Parent Node, just from it's children. And there's a bunch of different ways to do it too.
Hope this one helps!
Source: I just tested it on a Console App.
Obs: The doc object is the loaded xml document btw.
In my code i iterate through an xelement and have it return the value of each node within that element e.g.
foreach(XElement n in XDocument.Descedants("element_name)
{
Console.WriteLine("Searching: " n.Value);
}
My problem is the both <Directory> elements are returned in the string
Searching: C:\Users\215358\OneDrive\MusicC:\Users\215358\Dropbox\Music
My XML file looks like this:
<?xml version="1.0" encoding="utf-8"?>
<Directories>
<Directory>C:\Users\215358\OneDrive\Music</Directory>
<Directory>C:\Users\215358\Dropbox\Music</Directory>
</Directories>
I expect it to output the second line element in <Directory> like this:
C:\Users\215358\Dropbox\Music
Why is this happening?
XElement.Value gets the concatenated text contents of an element. This includes the text of child elements which is not always very helpful. If you just want the text from the current element, you can find the text node in its child nodes.
foreach(XElement n in XDocument.Descedants("Directory"))
{
var text = n.Nodes().Where (x => x is XText).Cast<XText>().FirstOrDefault ();
if(text!=null){
Console.WriteLine("Searching: " + text.Value);
}else{
Console.WriteLine("No text node found");
}
}
Since you want to iterate through each entry and search element value, you could do something like this.
foreach (var element in doc.Descendants("Directory"))
{
if((string)element.Value == "searchstring")
{
// your logic
}
}
In case if you are looking for second element in the xml, you could apply Skip extension to skip specified count of elements.
var secondelement = doc.Descendants("Directory").Skip(1); // Skip first element;
or if you are looking for last element, you could take Last or LastOrDefault extension.
var lastelement = doc.Descendants("Directory").LastOrDefault();
Check this example.
Here's some fantastic example XML:
<root>
<section>Here is some text<mightbe>a tag</mightbe>might <not attribute="be" />. Things are just<label>a mess</label>but I have to parse it because that's what needs to be done and I can't <font stupid="true">control</font> the source. <p>Why are there p tags here?</p>Who knows, but there may or may not be spaces around them so that's awesome. The point here is, there's node soup inside the section node and no definition for the document.</section>
</root>
I'd like to just grab the text from the section node and all sub nodes as strings. BUT, note that there may or may not be spaces around the sub-nodes, so I want to pad the sub notes and append a space.
Here's a more precise example of what input might look like, and what I'd like output to be:
<root>
<sample>A good story is the<book>Hitchhikers Guide to the Galaxy</book>. It was published<date>a long time ago</date>. I usually read at<time>9pm</time>.</sample>
</root>
I'd like the output to be:
A good story is the Hitchhikers Guide to the Galaxy. It was published a long time ago. I usually read at 9pm.
Note that the child nodes don't have spaces around them, so I need to pad them otherwise the words run together.
I was attempting to use this sample code:
XDocument doc = XDocument.Parse(xml);
foreach(var node in doc.Root.Elements("section"))
{
output += String.Join(" ", node.Nodes().Select(x => x.ToString()).ToArray()) + " ";
}
But the output includes the child tags, and is not going to work out.
Any suggestions here?
TL;DR: Was given node soup xml and want to stringify it with padding around child nodes.
Incase you have nested tags to an unknown level (e.g <date>a <i>long</i> time ago</date>), you might also want to recurse so that the formatting is applied consistently throughout. For example..
private static string Parse(XElement root)
{
return root
.Nodes()
.Select(a => a.NodeType == XmlNodeType.Text ? ((XText)a).Value : Parse((XElement)a))
.Aggregate((a, b) => String.Concat(a.Trim(), b.StartsWith(".") ? String.Empty : " ", b.Trim()));
}
You could try using xpath to extract what you need
var docNav = new XPathDocument(xml);
// Create a navigator to query with XPath.
var nav = docNav.CreateNavigator();
// Find the text of every element under the root node
var expression = "/root//*/text()";
// Execute the XPath expression
var resultString = nav.evaluate(expression);
// Do some stuff with resultString
....
References:
Querying XML, XPath syntax
Here is a possible solution following your initial code:
private string extractSectionContents(XElement section)
{
string output = "";
foreach(var node in section.Nodes())
{
if(node.NodeType == System.Xml.XmlNodeType.Text)
{
output += string.Format("{0}", node);
}
else if(node.NodeType == System.Xml.XmlNodeType.Element)
{
output += string.Format(" {0} ", ((XElement)node).Value);
}
}
return output;
}
A problem with your logic is that periods will be preceded by a space when placed right after an element.
You are looking at "mixed content" nodes. There is nothing particularly special about them - just get all child nodes (text nodes are nodes too) and join they values with space.
Something like
var result = String.Join("",
root.Nodes().Select(x => x is XText ? ((XText)x).Value : ((XElement)x).Value));
I am trying to get the elements title and runtime (siblings) where the runtime value is larger than the input value. My C# code with the XPath expression is:
ElementValue = 140;
nodeList = root.SelectNodes(#"/moviedb/movie[./runtime>'" + ElementValue + "'/title | /moviedb/movie[./runtime>'" + ElementValue + "']/runtime");
This XPath expression is not returning anything.
My XML file:
<moviedb>
<movie>
<imdbid>tt0120689</imdbid>
<genres>Crime,Drama,Fantasy,Mystery</genres>
<languages>English,French</languages>
<country>USA</country>
<rating>8.5</rating>
<runtime>189</runtime>
<title lang="english">The Green Mile</title>
<year>1999</year>
</movie>
<movie>
<imdbid>tt0415800</imdbid>
<genres>Action,Animation,Drama,Thriller</genres>
<languages>English</languages>
<country>USA</country>
<rating>4.5</rating>
<runtime>139</runtime>
<title lang="english">Fight Club</title>
<year>2004</year>
</movie>
</moviedb>
You can instead use linq2xml
var doc=XDocument.Load(path);
var movies=doc.Elements("movie")
.Where(x=>(int)x.Element("runtime")>input)
.Select(x=>new
{
Title=x.Element("title").Value,
Runtime=(int)x.Element("runtime")
});
You can now iterate over movies
foreach(var movie in movies)
{
movie.Title;
movie.Runtime;
}
You seem to be applying the values you want off the node as a filter criteria, which won't work. I would go about this another way, first finding the nodes which meet the criteria:
nodeList = root.SelectNodes(#"/moviedb/movie[runtime > " + ElementValue + "]");
And then grabbing the child elements from each:
foreach (var node in nodeList)
{
Debug.WriteLine(node.SelectSingleNode("title").InnerText);
Debug.WriteLine(node.SelectSingleNode("runtime").InnerText);
}
You can do this using a single XPath expression by performing a union i.e. the | operator. As mentioned in other answers here, you had your select inside your predicate which would not result in the correct answer for you anyway.
Note, if you want to see if a number is bigger than another number, unless you are using a Schema driven data-type aware XQuery engine you will need to cast the text() to a number before performing the comparison. In this instance I have assumed an xs:int will be suitable for you. Also you can use the atomic gt as opposed to = which may be more efficient.
ElementValue = 140;
nodeList = root.SelectNodes(#"/moviedb/movie[xs:int(runtime) gt " + ElementValue + "]/(title | runtime)");
I would like to traverse every element and attribute in an xml and grab the name an value without knowing the names of the elements in advance. I even have a book on linq to xml with C# and it only tells me how to query to get the value of elements when I already know the name of the element.
The code below only gives me the most high level element information. I need to also reach all of the descending elements.
XElement reportElements = null;
reportElements = XElement.Load(filePathName.ToString());
foreach (XElement xe in reportElements.Elements())
{
MessageBox.Show(xe.ToString());
}
Elements only walks one level; Descendants walks the entire DOM for elements, and you can then (per-element) check the attributes:
foreach (var el in doc.Descendants()) {
Console.WriteLine(el.Name);
foreach (var attrib in el.Attributes()) {
Console.WriteLine("> " + attrib.Name + " = " + attrib.Value);
}
}
You should try:
reportElements.Descendants()