HtmlAgilityPack HtmlDocument CreateNavigator Select XPath statement not working - c#

Lets say i have this XML
<root>
<myEntity> // list 1
<name>Test 1</name>
<entityNew> // list 2
<newName/>
<newName/>
</entityNew>
</myEntity>
<myEntity>
<name>Test 2</name>
<entityNew>
<newName/>
<newName/>
</entityNew>
</myEntity>
</root>
I want to get the entityNew list.
What I tried was this
//myEntity[1]/entityNew but it's not working, same as //myEntity[position()=1]/entityNew or something like this //myEntity[1]//entityNew
but when i tried it in an online XPath tester, it worked, this is the site i used http://www.freeformatter.com/xpath-tester.html

After some fiddling I noticed that HtmlAgilityPack is treating the node names as all lowercase and XPath is case sensitive so the queries you have tried are returning null.
I tried the following sample application (notice all node names are lowercase):
string xml = File.ReadAllText("XMLFile1.xml");
var doc = new HtmlDocument();
doc.LoadHtml(xml);
var navigator = doc.CreateNavigator();
var iterator = navigator.Select("root//myentity[1]/entitynew");
iterator.MoveNext();
Console.WriteLine(iterator.Current.OuterXml);
and the output is this:
<entitynew> // list 2
<newname />
<newname />
</entitynew>

Related

How can i get a collection of child elements - query returns null using Xdocument

Can anybody please tell me why this Xdocument query is returning null when there are elements / attributes that i'm trying to grab.
I'm trying to get a collection of the <version> elements so i can read the attributes of them. Example XML:
<dmodule>
<idstatus>
<dmaddres>
<dmc>Some DMC</dmc>
<dmtitle><techname>My techname</techname><infoname>My infoname</infoname></dmtitle>
<issno issno="004" type="revised">
<issdate year="2016" month="11" day="30"></dmaddres>
<status>
<security class="2">
<rpc>RPC1</rpc>
<orig>ORIG1</orig>
<applic>
<model model="2093">
**<version version="BASE"></version>
<version version="RNWB"></version>**</model></applic>
<techstd>
<autandtp>
<authblk></authblk>
<tpbase>-</tpbase></autandtp>
<authex></authex>
<notes></notes></techstd>
<qa>
<firstver type="tabtop"></qa></status>
</idstatus>
<dmodule>
And this is how i'm trying to get the <version> elements:
XDocument doc = XDocument.Load(sgmlReader);
List<string> applicabilityList = null;
//this doesn't work
//var applics = doc.XPathSelectElements("dmodule/idstatus/status/applic/model/version");
//nor does this
var applics = doc.Descendants("idstatus").Descendants("applic").Elements("version");
foreach (var applic in applics)
{
string applicVersion = applic.Attribute("version").ToString();
applicabilityList.Add(applicVersion);
}
return applicabilityList;
Either query as shown above returns no results. Cleary a silly mistake in my query but i'm out of practice.
That's because you are missing the model element.
...
<applic>
<model model="2093">
<version version="BASE"></
...
If all you are interested are the version elements you can simply do:
var versions = doc.Descendants("version");
This is the working code
var applics = doc.Descendants("dmodule")
.Descendants("idstatus")
.Descendants("status")
.Descendants("security")
.Descendants("applic")
.Descendants("model")
.Elements("version");

Searching through XML to find a list of items

I have an xml doc as such:
<?xml version="1.0" encoding="utf-8" ?>
<Categories>
<Category>
<Name>Fruit</Name>
<Items>
<Item>Apple</Item>
<Item>Banana</Item>
<Item>Peach</Item>
<Item>Strawberry</Item>
</Items>
</Category>
<Category>
<Name>Vegetable</Name>
<Items>
<Item>Carrots</Item>
<Item>Beets</Item>
<Item>Green Beans</Item>
<Item>Bell Pepper</Item>
</Items>
</Category>
<Category>
<Name>Seafood</Name>
<Items>
<Item>Crab</Item>
<Item>Lobster</Item>
<Item>Shrimp</Item>
<Item>Oysters</Item>
<Item>Salmon</Item>
</Items>
</Category>
</Categories>
I would like to be able to search on a term such as Category.Name = Fruit and get back the list of the Fruit Items.
Here is the incomplete code I've started so far:
string localPath = Server.MapPath("~/App_Data/Foods.xml");
XmlDocument doc = new XmlDocument();
doc.Load(localPath);
XmlNodeList list = doc.SelectNodes("Categories");
//Do something here to search the category names and get back the list of items.
This is my first attempt at parsing through XML so I'm a bit lost. Note: the application I am working on uses .Net 2.0
I'd suggest to read about XPath as you're limited to .NET 2.0, moreover XPath is very useful to work with XML even in more general context (not limited to .NET platform only).
In this particular case XPath become useful because SelectNodes() and SelectSingleNode() method accept XPath string as parameter. For example, to get all <Item> that corresponds to category name "Fruit" :
string localPath = Server.MapPath("~/App_Data/Foods.xml");
XmlDocument doc = new XmlDocument();
doc.Load(localPath);
XmlNodeList items = doc.SelectNodes("Categories/Category[Name='Fruit']/Items/Item");
foreach(XmlNode item in items)
{
Console.WriteLine(item.InnerText);
}
You can see XPath as a path, similar to file path in windows explorer. I'd try to explain only the bit that is different from common path expression in the above sample, particularly this bit :
...\Category[Name='Fruit']\...
the expression within square-brackets is a filter which say search for <Category> node having child node <Name> equals "Fruit".
You are on the right path. However, you would need to load the 'Categories' node first, then you can get it's child nodes.
I have added a filter to return only nodes where the name is "Fruit".
XmlNode cat = doc.SelectSingleNode("Categories");
var list = cat.SelectNodes("Category").Cast<XmlNode>()
.Where(c => c.SelectSingleNode("Name").InnerText == "Fruit");
foreach ( XmlNode item in list )
{
// process each node here
}

Getting an XElement with a namespace via XPathSelectElements

I have an XML e.g.
<?xml version="1.0" encoding="utf-8"?>
<A1>
<B2>
<C3 id="1">
<D7>
<E5 id="abc" />
</D7>
<D4 id="1">
<E5 id="abc" />
</D4>
<D4 id="2">
<E5 id="abc" />
</D4>
</C3>
</B2>
</A1>
This is may sample code:
var xDoc = XDocument.Load("Test.xml");
string xPath = "//B2/C3/D4";
//or string xPath = "//B2/C3/D4[#id='1']";
var eleList = xDoc.XPathSelectElements(xPath).ToList();
foreach (var xElement in eleList)
{
Console.WriteLine(xElement);
}
It works perfectly, but if I add a namespace to the root node A1, this code doesn't work.
Upon searching for solutions, I found this one, but it uses the Descendants() method to query the XML. From my understanding, this solution would fail if I was searching for <E5> because the same tag exists for <D7>, <D4 id="1"> and <D4 id="2">
My requirement is to search if a node exists at a particular XPath. If there is a way of doing this using Descendants, I'd be delighted to use it. If not, please guide me on how to search using the name space.
My apologies in case this is a duplicate.
To keep using XPath, you can use something link this:
var xDoc = XDocument.Parse(#"<?xml version='1.0' encoding='utf-8'?>
<A1 xmlns='urn:sample'>
<B2>
<C3 id='1'>
<D7><E5 id='abc' /></D7>
<D4 id='1'><E5 id='abc' /></D4>
<D4 id='2'><E5 id='abc' /></D4>
</C3>
</B2>
</A1>");
// Notice this
XmlNamespaceManager nsmgr = new XmlNamespaceManager(new NameTable());
nsmgr.AddNamespace("sample", "urn:sample");
string xPath = "//sample:B2/sample:C3/sample:D4";
var eleList = xDoc.XPathSelectElements(xPath, nsmgr).ToList();
foreach (var xElement in eleList)
{
Console.WriteLine(xElement);
}
but it uses the Descendants() method to query the XML. From my understanding, this solution would fail if I was searching for because the same tag exists for , and
I'm pretty sure you're not quite understanding how that works. From the MSDN documentation:
Returns a filtered collection of the descendant elements for this document or element, in document order. Only elements that have a matching XName are included in the collection.
So in your case, just do this:
xDoc.RootNode
.Descendants("E5")
.Where(n => n.Parent.Name.LocalName == "B4");
Try this
var xDoc = XDocument.Parse("<A1><B2><C3 id=\"1\"><D7><E5 id=\"abc\" /></D7><D4 id=\"1\"><E5 id=\"abc\" /></D4><D4 id=\"2\"><E5 id=\"abc\" /></D4></C3></B2></A1>");
foreach (XElement item in xDoc.Element("A1").Elements("B2").Elements("C3").Elements("D4"))
{
Console.WriteLine(item.Element("E5").Value);//to get the value of E5
Console.WriteLine(item.Element("E5").Attribute("id").Value);//to get the value of attribute
}

Inserting and saving xml using Linq to XML

If i have an XML file settings.xml like below
<Root>
<First>
</First>
</Root>
I Load the XML first using XDocument settings = XDocument.Load("settings.xml")
How should I insert a XML node inside the node First and save it using LINQ-to-XML?
First you need to find the First element. Then you can add other elements and attributes to it.
There are more than one way to find an element in the xml: Elements, Descendants, XPathSelectElement, etc.
var firstElement = settings.Descendants("First").Single();
firstElement.Add(new XElement("NewElement"));
settings.Save(fileName);
// or
var newXml = settings.ToString();
Output:
<Root>
<First>
<NewElement />
</First>
</Root>
Or element with attribute:
firstElement.Add(
new XElement("NewElement", new XAttribute("NewAttribute", "TestValue")));
Output:
<Root>
<First>
<NewElement NewAttribute="TestValue" />
</First>
</Root>
[Edit] The answer to the bonus question. What to do if the first element does not exist and I want to create it:
var root = settings.Element("Root");
var firstElement = root.Element("First");
if (firstElement == null)
{
firstElement = new XElement("First");
root.Add(firstElement);
}
firstElement.Add(new XElement("NewElement"));

Why doesn't this XPath query returns any nodes?

I'm querying Sharepoint server-side and getting back results as Xml. I want to slim down the Xml into something more lightweight before sending it to jQuery through a WebMethod.
However my XPath query isn't working. I thought the following code would return all Document nodes, but it returns nothing. I've used XPath a little before, I thought //Document do the trick.
C# XPath query
XmlDocument xmlResults = new XmlDocument();
xmlResults.LoadXml(xml); // XML is a string containing the XML source shown below
XmlNodeList results = xmlResults.SelectNodes("//Document");
XML being queried
<ResponsePacket xmlns="urn:Microsoft.Search.Response">
<Response domain="QDomain">
<Range>
<StartAt>1</StartAt>
<Count>2</Count>
<TotalAvailable>2</TotalAvailable>
<Results>
<Document relevance="126" xmlns="urn:Microsoft.Search.Response.Document">
<Title>Example 1.doc</Title>
<Action>
<LinkUrl size="32256" fileExt="doc">http://hqiis99/Mercury/Mercury documents/Example 1.doc</LinkUrl>
</Action>
<Description />
<Date>2010-08-19T14:44:56+01:00</Date>
</Document>
<Document relevance="31" xmlns="urn:Microsoft.Search.Response.Document">
<Title>Mercury documents</Title>
<Action>
<LinkUrl size="0" fileExt="aspx">http://hqiis99/mercury/Mercury documents/Forms/AllItems.aspx</LinkUrl>
</Action>
<Description />
<Date>2010-08-19T14:49:39+01:00</Date>
</Document>
</Results>
</Range>
<Status>SUCCESS</Status>
</Response>
</ResponsePacket>
You're trying to select Document elements which don't have a namespace... whereas the default namespace is actually "urn:Microsoft.Search.Response" here.
I think you want something like this:
XmlDocument xmlResults = new XmlDocument();
xmlResults.LoadXml(xml);
XmlNamespaceManager manager = new XmlNamespaceManager(xmlResults.NameTable);
manager.AddNamespace("ns", "urn:Microsoft.Search.Response.Document");
XmlNodeList results = xmlResults.SelectNodes("//ns:Document", manager);
This finds two elements.
If you can use LINQ to XML instead, it makes it all somewhat easier:
XDocument results = XDocument.Parse(xml);
XNamespace ns = "urn:Microsoft.Search.Response.Document";
var documents = results.Descendants(ns + "Document");
I love LINQ to XML's namespace handling :)
Alternatively, you could try the following and ignore the namespaces:
XmlDocument xmlResults = new XmlDocument();
xmlResults.LoadXml(xmlString);
XmlNodeList results = xmlResults.SelectNodes("//*[local-name()='Document']");

Categories