In a XML document such as the following:
<root>
<fish value="Start"/>
<pointlessContainer>
<anotherNode value="Pick me!"/>
<anotherNode value="Pick me too!"/>
<fish value="End"/>
</pointlessContainer>
</root>
How can I use the wonder of LINQ to XML to find any nodes completely contained by the fish nodes? Note that in this example, I have deliberately placed the fish nodes at different levels in the document, as I anticipate this scenario will occur in the wild.
Obviously, in this example, I would be looking to get the two anotherNode nodes, but not the pointlessContainer node.
NB: the two 'delimiting' nodes may have the same type (e.g. fish) as other non-delimiting nodes in the document, but they would have unique attributes and therefore be easy to identify.
For your sample, the following should do
XDocument doc = XDocument.Load(#"..\..\XMLFile2.xml");
XElement start = doc.Descendants("fish").First(f => f.Attribute("value").Value == "Start");
XElement end = doc.Descendants("fish").First(f => f.Attribute("value").Value == "End");
foreach (XElement el in
doc
.Descendants()
.Where(d =>
XNode.CompareDocumentOrder(d, end) == -1
&& XNode.CompareDocumentOrder(d, start) == 1
&& !end.Ancestors().Contains(d)))
{
Console.WriteLine(el);
}
But I haven't tested or thoroughly pondered whether it works for other cases. Maybe you can check for some of your sample data and report back whether it works.
Related
I'm working on this using C# .net VS 2013.
I have a scenario where I'm having the structure as below,
<td>
<text text="abc">abc
<tspan text = "bcd">bcd
<tspan text = "def">def
<tspan text = "gef">gef
</tspan>
</tspan>
</tspan>
</text>
</td>
As shown above, I don't know how many tspan nodes will be there, currently I have 3, I may get 4 or more than that.
Once after finding the text node, to get the value of that node I'll use the code,
labelNode.Attributes["text"].Value
to get its adjacent tspan node, I have to use it like
labelNode.FirstChild.Attributes["text"].Value
to get its adjacent tspan node, I have to use it like
labelNode.FirstChild.FirstChild.Attributes["text"].Value
Like this it will keep on going.
Now my question is, if I know that i have 5 tags, is there any way to dynamically add "FirstChild" 5 times to "labelNode" so that I can get the text value of the last node, like this
labelNode.FirstChild.FirstChild.FirstChild.FirstChild.FirstChild.Attributes["text"].Value
If I need 2nd value i need to add it 2 times, if I need 3rd then I need to add it thrice.
Please let me know is there any solution for this.
Please ask me, if you got confused with my question.
Thanking you all in advance.
Rather than adding FirstChild dynamically, I think this would be a simpler solution:
static XmlNode GetFirstChildNested(XmlNode node, int level) {
XmlNode ret = node;
while (level > 0 && ret != null) {
ret = ret.FirstChild;
level--;
}
return ret;
}
Then you could use this function like this:
var firstChild5 = GetFirstChildNested(labelNode, 5);
I would suggesting using Linq to Xml which has cleaner way parsing Xml
Using XElement (or XDocument) you could flatten the hierarchy by calling Descendant method and do all required queries.
ex..
XElement doc= XElement.Load(filepath);
var results =doc.Descendants()
.Select(x=>(string)x.Attribute("text"));
//which returns
abc,
bcd,
def,
gef
If you want to get the last child you could simply use.
ex..
XElement doc= XElement.Load(filepath);
doc.Descendants()
.Last() // get last element in hierarchy.
.Attribute("text").Value
If you want to get third element, you could do something like this.
XElement doc= XElement.Load(filepath);
doc.Descendants()
.Skip(2) // Skip first two.
.First()
.Attribute("text").Value ;
Check this Demo
I have to extract values belonging to certain elements in an XML file and this is what I ended up with.
XDocument doc = XDocument.Load("request.xml");
var year = (string)doc.Descendants("year").FirstOrDefault();
var id = (string)doc.Descendants("id").FirstOrDefault();
I'm guessing that for each statement I'm iterating through the entire file looking for the first occurrence of the element called year/id. Is this the correct way to do this? It seems like there has to be a way where one would avoid unnecessary iterations. I know what I'm looking for and I know that the elements are going to be there even if the values may be null.
I'm thinking in the lines of a select statement with both "year" and "id" as conditions.
For clearance, I'm looking for certain elements and their respective values. There'll most likely be multiple occurrences of the same element but FirstOrDefault() is fine for that.
Further clarification:
As requested by the legend Jon Skeet, I'll try to clarify further. The XML document contains fields such as <year>2015</year> and <id>123032</id> and I need the values. I know which elements I'm looking for, and that they're going to be there. In the sample XML below, I would like to get 2015, The Emperor, something and 30.
Sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<documents xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<make>Apple</make>
<year>2015</year>
<customer>
<name>The Emperor</name>
<level2>
<information>something</information>
</level2>
<age>30</age>
</customer>
A code that doesn't parse the whole xml twice would be like:
XDocument doc = XDocument.Load("request.xml");
string year = null;
string id = null;
bool yearFound = false, idFound = false;
foreach (XElement ele in doc.Descendants())
{
if (!yearFound && ele.Name == "year")
{
year = (string)ele;
yearFound = true;
}
else if (!idFound && ele.Name == "id")
{
id = (string)ele;
idFound = true;
}
if (yearFound && idFound)
{
break;
}
}
As you can see you are trading lines of code for speed :-) I do feel the code is still quite readable.
if you really need to optimize up to the last line of code, you could put the names of the elements in two variables (because otherwise there will be many temporary XName creation)
// before the foreach
XName yearName = "year";
XName idName = "id";
and then
if (!yearFound && ele.Name == yearName)
...
if (!idFound && ele.Name == idName)
Here's some fantastic example XML:
<root>
<section>Here is some text<mightbe>a tag</mightbe>might <not attribute="be" />. Things are just<label>a mess</label>but I have to parse it because that's what needs to be done and I can't <font stupid="true">control</font> the source. <p>Why are there p tags here?</p>Who knows, but there may or may not be spaces around them so that's awesome. The point here is, there's node soup inside the section node and no definition for the document.</section>
</root>
I'd like to just grab the text from the section node and all sub nodes as strings. BUT, note that there may or may not be spaces around the sub-nodes, so I want to pad the sub notes and append a space.
Here's a more precise example of what input might look like, and what I'd like output to be:
<root>
<sample>A good story is the<book>Hitchhikers Guide to the Galaxy</book>. It was published<date>a long time ago</date>. I usually read at<time>9pm</time>.</sample>
</root>
I'd like the output to be:
A good story is the Hitchhikers Guide to the Galaxy. It was published a long time ago. I usually read at 9pm.
Note that the child nodes don't have spaces around them, so I need to pad them otherwise the words run together.
I was attempting to use this sample code:
XDocument doc = XDocument.Parse(xml);
foreach(var node in doc.Root.Elements("section"))
{
output += String.Join(" ", node.Nodes().Select(x => x.ToString()).ToArray()) + " ";
}
But the output includes the child tags, and is not going to work out.
Any suggestions here?
TL;DR: Was given node soup xml and want to stringify it with padding around child nodes.
Incase you have nested tags to an unknown level (e.g <date>a <i>long</i> time ago</date>), you might also want to recurse so that the formatting is applied consistently throughout. For example..
private static string Parse(XElement root)
{
return root
.Nodes()
.Select(a => a.NodeType == XmlNodeType.Text ? ((XText)a).Value : Parse((XElement)a))
.Aggregate((a, b) => String.Concat(a.Trim(), b.StartsWith(".") ? String.Empty : " ", b.Trim()));
}
You could try using xpath to extract what you need
var docNav = new XPathDocument(xml);
// Create a navigator to query with XPath.
var nav = docNav.CreateNavigator();
// Find the text of every element under the root node
var expression = "/root//*/text()";
// Execute the XPath expression
var resultString = nav.evaluate(expression);
// Do some stuff with resultString
....
References:
Querying XML, XPath syntax
Here is a possible solution following your initial code:
private string extractSectionContents(XElement section)
{
string output = "";
foreach(var node in section.Nodes())
{
if(node.NodeType == System.Xml.XmlNodeType.Text)
{
output += string.Format("{0}", node);
}
else if(node.NodeType == System.Xml.XmlNodeType.Element)
{
output += string.Format(" {0} ", ((XElement)node).Value);
}
}
return output;
}
A problem with your logic is that periods will be preceded by a space when placed right after an element.
You are looking at "mixed content" nodes. There is nothing particularly special about them - just get all child nodes (text nodes are nodes too) and join they values with space.
Something like
var result = String.Join("",
root.Nodes().Select(x => x is XText ? ((XText)x).Value : ((XElement)x).Value));
I have a simple XML
<AllBands>
<Band>
<Beatles ID="1234" started="1962">greatest Band<![CDATA[lalala]]></Beatles>
<Last>1</Last>
<Salary>2</Salary>
</Band>
<Band>
<Doors ID="222" started="1968">regular Band<![CDATA[lalala]]></Doors>
<Last>1</Last>
<Salary>2</Salary>
</Band>
</AllBands>
However ,
when I want to reach the "Doors band" and to change its ID :
using (var stream = new StringReader(result))
{
XDocument xmlFile = XDocument.Load(stream);
var query = from c in xmlFile.Elements("Band")
select c;
...
query has no results
But
If I write xmlFile.Elements().Elements("Band") so it Does find it.
What is the problem ?
Is the full path from the Root needed ?
And if so , Why did it work without specify AllBands ?
Does the XDocument Navigation require me to know the full level structure down to the required element ?
Elements() will only check direct children - which in the first case is the root element, in the second case children of the root element, hence you get a match in the second case. If you just want any matching descendant use Descendants() instead:
var query = from c in xmlFile.Descendants("Band") select c;
Also I would suggest you re-structure your Xml: The band name should be an attribute or element value, not the element name itself - this makes querying (and schema validation for that matter) much harder, i.e. something like this:
<Band>
<BandProperties Name ="Doors" ID="222" started="1968" />
<Description>regular Band<![CDATA[lalala]]></Description>
<Last>1</Last>
<Salary>2</Salary>
</Band>
You can do it this way:
xml.Descendants().SingleOrDefault(p => p.Name.LocalName == "Name of the node to find")
where xml is a XDocument.
Be aware that the property Name returns an object that has a LocalName and a Namespace. That's why you have to use Name.LocalName if you want to compare by name.
You should use Root to refer to the root element:
xmlFile.Root.Elements("Band")
If you want to find elements anywhere in the document use Descendants instead:
xmlFile.Descendants("Band")
The problem is that Elements only takes the direct child elements of whatever you call it on. If you want all descendants, use the Descendants method:
var query = from c in xmlFile.Descendants("Band")
My experience when working with large & complicated XML files is that sometimes neither Elements nor Descendants seem to work in retrieving a specific Element (and I still do not know why).
In such cases, I found that a much safer option is to manually search for the Element, as described by the following MSDN post:
https://social.msdn.microsoft.com/Forums/vstudio/en-US/3d457c3b-292c-49e1-9fd4-9b6a950f9010/how-to-get-tag-name-of-xml-by-using-xdocument?forum=csharpgeneral
In short, you can create a GetElement function:
private XElement GetElement(XDocument doc,string elementName)
{
foreach (XNode node in doc.DescendantNodes())
{
if (node is XElement)
{
XElement element = (XElement)node;
if (element.Name.LocalName.Equals(elementName))
return element;
}
}
return null;
}
Which you can then call like this:
XElement element = GetElement(doc,"Band");
Note that this will return null if no matching element is found.
The Elements() method returns an IEnumerable<XElement> containing all child elements of the current node. For an XDocument, that collection only contains the Root element. Therefore the following is required:
var query = from c in xmlFile.Root.Elements("Band")
select c;
Sebastian's answer was the only answer that worked for me while examining a xaml document. If, like me, you'd like a list of all the elements then the method would look a lot like Sebastian's answer above but just returning a list...
private static List<XElement> GetElements(XDocument doc, string elementName)
{
List<XElement> elements = new List<XElement>();
foreach (XNode node in doc.DescendantNodes())
{
if (node is XElement)
{
XElement element = (XElement)node;
if (element.Name.LocalName.Equals(elementName))
elements.Add(element);
}
}
return elements;
}
Call it thus:
var elements = GetElements(xamlFile, "Band");
or in the case of my xaml doc where I wanted all the TextBlocks, call it thus:
var elements = GetElements(xamlFile, "TextBlock");
I’m trying to use Linq XML to select a number of nodes and the children but getting terrible confused!
In the example XML below I need to pull out all the <MostWanted> and all the Wanted with their child nodes but without the other nodes in between the Mostwanted and Wanted nodes.
This because each MostWanted can be followed by any number of Wanted and the Wanted relate to the preceding Mostwanted.
I’m even confusing myself typing this up!!!
How can I do this in C#??
<root>
<top>
<NotWanted3>
</NotWanted3>
<MostWanted>
<UniqueKey>1</UniqueKey>
<QuoteNum>1</QuoteNum>
</MostWanted>
<NotWanted2>
<UniqueKey>1</UniqueKey>
<QuoteNum>1</QuoteNum>
</NotWanted2>
<NotWanted1>
<UniqueKey>0001</UniqueKey>
</NotWanted1>
<Wanted>
<Seg>
<SegNum>1</SegNum>
</Seg>
</Wanted>
<Wanted>
<Seg>
<SegNum>2</SegNum>
</Seg>
</Wanted>
<NotWanted>
<V>x</V>
</NotWanted>
<NotWanted3>
</NotWanted3>
<MostWanted>
<UniqueKey>1</UniqueKey>
<QuoteNum>1</QuoteNum>
</MostWanted>
<NotWanted2>
<UniqueKey>1</UniqueKey>
<QuoteNum>1</QuoteNum>
</NotWanted2>
<NotWanted1>
<UniqueKey>0002</UniqueKey>
</NotWanted1>
<Wanted>
<Seg>
<SegNum>3</SegNum>
</Seg>
</Wanted>
<Wanted>
<Seg>
<SegNum>4</SegNum>
</Seg>
</Wanted>
<NotWanted>
<V>x</V>
</NotWanted>
</top>
</root>
Why don't you just use:
XName wanted = "Wanted";
XName mostWanted = "MostWanted";
var nodes = doc.Descendants()
.Where(x => x.Name == wanted || x.Name == mostWanted);
That will retrieve every element called "Wanted" or "MostWanted". From each of those elements you can get to the child elements etc.
If this isn't what you're after, please clarify your question.