Xpath where parent and child both have attribute - htmlagilitypack - c#

I have something like this,
<div class="rightpanel">
.
.
<ul..>...</ul>
<ul class="side-panel categories"></ul>
.
</div>
I am trying to select the ul having class side-panel categories with the htmlagilitypack.
I have tried,
HtmlDocument.DocumentNode.SelectNodes("//div[#class='right-panel']/ul[#class='side-panel categories']")
but it's giving me a nullreference exception ...
please help.

In the HTML, the div has the class rightpanel, but in your XPath it's right-panel. Please try this:
HtmlDocument.DocumentNode
.SelectNodes("//div[#class='rightpanel']/ul[#class='side-panel categories']")
And if the <ul> is not a direct child of the <div> (i.e. if it had a parent that was not the <div> itself, you would need this:
HtmlDocument.DocumentNode
.SelectNodes("//div[#class='rightpanel']//ul[#class='side-panel categories']")
with a double-slash before the ul.

Related

Scraping from a div

I am experimenting with web scraping and I am having trouble scraping a particular value out of some nested div classes. I am using the .NET HtmlAgilityPack class library in a .NET Framework C# Console App. Here is the div code:
<div class="ds-nearby-schools-list">
<div class="ds-school-row">
<div class="ds-school-rating">
<div class="ds-gs-rating-8">
<span class="ds-hero-headline ds-schools-display-rating">8</span>
<span class="ds-rating-denominator ds-legal">/10</span>
</div>
</div>
<div class="ds-nearby-schools-info-section">
<a class="ds-school-name ds-standard-label notranslate" href="https://www.greatschools.org/school?id=00870&state=MD" rel="nofollow noopener noreferrer" target="_blank">Candlewood Elementary School</a>
<ul class="ds-school-info-section">
<li class="ds-school-info">
<span class="ds-school-key ds-body-small">Grades:</span>
<span class="ds-school-value ds-body-small">K-5</span>
</li>
<li class="ds-school-info">
<span class="ds-school-key ds-body-small">Distance:</span>
<span class="ds-school-value ds-body-small">0.8 mi</span>
</li>
</ul>
</div>
</div>
</div>
I want to scrape the "8" from the ds-hero-headline ds-schools-display-rating class. I am having trouble formulating the selector for the SelectNodes method on the DocumentNode object of the HtmlNode.HtmlDocument class.
I guess you might be having a trouble to write XPath to select the node. Try //*[contains(#class, 'ds-hero-headline') and contains(#class, 'ds-schools-display-rating')] with SelectNodes method.
However, this XPath could have a problem if the page your targeting would also have class name like ds-hero-headline-content, which ds-hero-headline can partially match. In that case, see the solution in How can I find an element by CSS class with XPath?
I would use this to extract 0.8 mi
//div[#class='ds-nearby-schools-list']/div[#class='ds-school-row']/div[#class='ds-nearby-schools-info-section']/ul[#class='ds-school-info-section']/li[#class='ds-school-info']/span[#class='ds-school-value ds-body-small' and preceding-sibling::span[#class='ds-school-key ds-body-small' and text()='Distance:']]/text()
Then this regex to group data:
^[0-9\.]+ (.*)$
At the end you can use some kind of conversion to save distance to an object.
Have you tried the following to get the 8. You can search for a specific span element with the class name to get the inner text.
Note: I used text file to load the html from your question.
string htmlFile = File.ReadAllText(#"TempFile.html");
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(htmlFile);
HtmlNode htmlDoc = doc.DocumentNode;
HtmlNode node = htmlDoc.SelectSingleNode("//span[#class='ds-hero-headline ds-schools-display-rating']");
Console.WriteLine(node.InnerText);
// output: 8
Alternate:
Another way is to specify the path that you want the value from, starting from the div element.
HtmlNode node2 = htmlDoc.SelectSingleNode("//div[#class='ds-gs-rating-8']//span[#class='ds-hero-headline ds-schools-display-rating']");
Console.WriteLine(subNode.InnerText);
output
8

XPath select all child nodes of the first child or an element?

I'm trying to get a collection of li elements (child nodes) with Selenium using xpath but its returning null?
Javascript pseudo code:
document.getElementsByClassName("jSC57 _6xe7A")[0].firstChild.childNodes
I want to get that in XPATH - but its returning null?
document.DocumentNode.SelectNodes("//*[#class='jSC57 _6xe7A']//div//li");
HTML sample:
<div class="isgrP">
<ul class="jSC57 _6xe7A">
<div class>
<li class="wo9IH">
<div class="li-container">
testing
</div>
</li>
</div>
</ul>
</div>
Based on your statement document.getElementsByClassName("jSC57 _6xe7A")[0].firstChild.childNodes, I am making an assumption about what you want to retrieve here. getElementsByClassName("jSC57 _6xe7A") will return a list of ul element. Then, your [0] gets the first ul. firstChild will grab the first div element, and childNodes refers to the li elements that appear under div.
To get the li elements you are trying to retrieve from your tree, I might not use those class names -- they appear to be dynamic and might change each time you load the page.
I would use this XPath instead:
document.DocumentNode.SelectNodes("//ul/div//li");
This will get all of the div nodes that appear under the li element.
But this might be too general for your purposes -- If you want to use the class name, you can modify as such:
document.DocumentNode.SelectNodes("//ul[#class='jSC57 _6xe7A']/div/li");

How to iterate sub elements of <ul> in Selenium with C#

I am trying to iterate <li> items of <ul> object in Selenium with C#. I am very new to it and trying to find a easy way write a code for my test project.
here is my sample html code, I would like to access those links below...
<ul class="list-menu">
<li>
<ul>
<li><a class="head" href="/seramik-banyo-urunleri">Seramik Banyo Ürünleri</a></li>
<li>Lavabo</li>
<li>Klozet</li>
my c# code is like below
IList<IWebElement> results = driver.FindElements(By.XPath("//div[#class='list-menu']/li/lu/li"));
but it doesn't get the links. what can I do to fix this?
You have a typo in the XPath expression, replace:
//div[#class='list-menu']/li/lu/li
with:
//div[#class='list-menu']/li/ul/li
Or, you can use a more compact CSS selector instead:
driver.FindElements(By.CssSelector(".list-menu > li > ul > li"));
where > means a direct parent-child relationship.

Find element in Selenium using XPATH or CSS Selector

I m trying to find the element "import" using Selenium Webdriver in C#. Have tried the following codes but nothing find it.
driver.FindElement(By.XPath("//*[#class='menu_bg']/ul/li[3]")).Click();
driver.FindElement(By.XPath("//*[#id='import']/a")).Click();
driver.FindElement(By.CssSelector("#import>a")).Click();
driver.FindElement(By.XPath("//*[#class='menu_bg']/ul/li[3]/a")).Click();
driver.FindElement(By.CssSelector("ul[#class='menu_bg']>li[value='3']")).Click();
Please help me out. Design page looks like below:
<body>
<div class="header_bg"></div>
<div class="menu_bg">
<ul class="menu">
<li id="retrieve"></li>
<li id="scan" class="test"></li>
<li id="import">
<a target="main" href="import/import.aspx" onclick="clickme(this,'import')">Import</a>
</li>
<li id="admin"></li>
<li id="help"></li>
<li style="float: right;"></li>
</ul>
</div>
</body>
All the time I got the error as below:
unable to find the element
XPath indexers are 1-based, as opposed to most other languages whereby they are 0-based.
This means you are actually targetting the 2nd li element, which has no anchor element.
So:
//*[#class='menu_bg']/ul/li[3]/a
However, this XPath query is not great and is too strict on position - thus although this newly fixed XPath above should work, I'd advise you to think of something else.
By reviewing this link(Thanks to #Arran), the above issue was fixed. 'switching' to the current IFrame directs Selenium to show any requests to that frame instead.
driver.SwitchTo().Frame()
You can do this by chaining Selenium 'FindElement' like so;
driver.FindElement(By.Id("import")).FindElement(By.TagName("a"));
which will give you the child of the element with ID that has a tag of 'a'.
Another way you could do this is by casting your Driver to an IJavascriptExecutor and executing javascript directly in the browser using a JQuery selector. I find this better for more complex Selenium lookups;
((IJavascriptExecutor)Driver).ExecuteScript("$("a[target='main'][href='import/import.aspx'])").click();

How to find div Element without an id attribute with Selenium Webdriver

I'm using C# and Selenium Webdriver and I'm trying to find a div Element in my html code which looks like this:
<div class="x-grid-cell-inner" style="text-align: left;" unselectable="on">
phys_tag_desc
</div>
I cant find a method to search for the value of the div Element with Selenium Webdriver. I already searched this site and checked the Selenium Webdriver Documentation, but couldn't find anything.
Well if text value is unique, then solution is simple. Try the xpath below:
//div[text()='phys_tag_desc']
If the text is not exact match. Try following:
//div[contains(text(),'phys_tag_desc')]
I have two ways.
Way 2 is more complex but more effective.
Way 1;
You can loop in all divs and look for some equals
Example:
foreach(HtmlElement o in webbrowser.Document.GetElementByTagName("div"))
{
HtmlElement yourElement;
if(o.GetAttribute("class")=="x-grid-cell-inner"&&o.GetAttribute("style")=="text-align: left;")
{
yourElement=o;
break;
}
DoSomethingWith(yourElememt);
}
The other way is follow elements path;
You can find the closer element that has a ID
Example:
<div id="element">
<div>content..</div>
<div>
<div class="x-grid-cell-inner" style="text-align: left;" unselectable="on">
phys_tag_desc
</div>
</div>
</div>
The closest element that have id on this example is
<div id="element">
your element's parent is 2. children of id="element" div
You can get it and follow path like this
yourElement = webbrowser.Document.GetElementById("element").Children[1].Children[0];
In other situation you can use the XPATH Boolean operators.
Try the xpath below:
By.XPath("//div[contains(#class,'*x-grid-cell-inner*') and contains(#unselectable, '*on*') and contains(text(),'*phys_tag_desc*')]")
Bye

Categories