Why is my xpath able to find the element at all?

Why is my xpath able to find the element at all? - c#

I want to print text of each div with class="Name". The code below prints Name1 three times instead of Name1, Name2 and Name3.
Why does my code print Name1 three times?
Why is dateInput.FindElement even able to find the Root div at all? Root div is located in completely different level than the date element. And since I'm doing //div..., which means find the div in the current node (right?), on dateInput.FindElement it should NOT even find the Root div, right?
CODE
var dateInput = driver.FindElement(By.Id("date"));
var rootElement = dateInput.FindElement(By.XPath("//div[contains(#class,'Root')]"));
var boxes = rootElement.FindElements(By.XPath("//div[contains(#class,'Box)]"));
foreach (var box in boxes)
{
var nameElement = box.FindElement(By.XPath("//div[contains(#class,'Name')]"));
Console.WriteLine(nameElement.Text);
}
HTML
<div>
<div>
<input id="date"></div>
</div>
<div class="__Root">
<div>
<div class="__Box">
<div class="__Name">Name1</div>
</div>
<div class="__Box">
<div class="__Name">Name2</div>
</div>
<div class="__Box">
<div class="__Name">Name3</div>
</div>
</div>
</div>
</div>

You are evaluating an XPath expression starting with // relative to a particular context node, but the meaning of // is to search the document from the document's root, ignoring the context node altogether (except of course that the context does provide the document which is being searched). So you execute the same query three times. Each time, your query expression matches all 3 div elements in the document, but because the findElement method is defined to return a single element, it is returning the first one each time.
To search within a subtree rooted at the context node, your expression should start with .//.
Secondly, you could just search directly for the "Name" div elements with a single XPath expression (broken onto multiple lines for readability), and simplify your c# code drastically:
//div[contains(#class,'Root')]
//div[contains(#class,'Box')]
//div[contains(#class,'Name')]

Related

Scraping from a div

I am experimenting with web scraping and I am having trouble scraping a particular value out of some nested div classes. I am using the .NET HtmlAgilityPack class library in a .NET Framework C# Console App. Here is the div code:
<div class="ds-nearby-schools-list">
<div class="ds-school-row">
<div class="ds-school-rating">
<div class="ds-gs-rating-8">
<span class="ds-hero-headline ds-schools-display-rating">8</span>
<span class="ds-rating-denominator ds-legal">/10</span>
</div>
</div>
<div class="ds-nearby-schools-info-section">
<a class="ds-school-name ds-standard-label notranslate" href="https://www.greatschools.org/school?id=00870&state=MD" rel="nofollow noopener noreferrer" target="_blank">Candlewood Elementary School</a>
<ul class="ds-school-info-section">
<li class="ds-school-info">
<span class="ds-school-key ds-body-small">Grades:</span>
<span class="ds-school-value ds-body-small">K-5</span>
</li>
<li class="ds-school-info">
<span class="ds-school-key ds-body-small">Distance:</span>
<span class="ds-school-value ds-body-small">0.8 mi</span>
</li>
</ul>
</div>
</div>
</div>
I want to scrape the "8" from the ds-hero-headline ds-schools-display-rating class. I am having trouble formulating the selector for the SelectNodes method on the DocumentNode object of the HtmlNode.HtmlDocument class.

I guess you might be having a trouble to write XPath to select the node. Try //*[contains(#class, 'ds-hero-headline') and contains(#class, 'ds-schools-display-rating')] with SelectNodes method.
However, this XPath could have a problem if the page your targeting would also have class name like ds-hero-headline-content, which ds-hero-headline can partially match. In that case, see the solution in How can I find an element by CSS class with XPath?

I would use this to extract 0.8 mi
//div[#class='ds-nearby-schools-list']/div[#class='ds-school-row']/div[#class='ds-nearby-schools-info-section']/ul[#class='ds-school-info-section']/li[#class='ds-school-info']/span[#class='ds-school-value ds-body-small' and preceding-sibling::span[#class='ds-school-key ds-body-small' and text()='Distance:']]/text()
Then this regex to group data:
^[0-9\.]+ (.*)$
At the end you can use some kind of conversion to save distance to an object.

Have you tried the following to get the 8. You can search for a specific span element with the class name to get the inner text.
Note: I used text file to load the html from your question.
string htmlFile = File.ReadAllText(#"TempFile.html");
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(htmlFile);
HtmlNode htmlDoc = doc.DocumentNode;
HtmlNode node = htmlDoc.SelectSingleNode("//span[#class='ds-hero-headline ds-schools-display-rating']");
Console.WriteLine(node.InnerText);
// output: 8
Alternate:
Another way is to specify the path that you want the value from, starting from the div element.
HtmlNode node2 = htmlDoc.SelectSingleNode("//div[#class='ds-gs-rating-8']//span[#class='ds-hero-headline ds-schools-display-rating']");
Console.WriteLine(subNode.InnerText);
output
8

XPath select all child nodes of the first child or an element?

I'm trying to get a collection of li elements (child nodes) with Selenium using xpath but its returning null?
Javascript pseudo code:
document.getElementsByClassName("jSC57 _6xe7A")[0].firstChild.childNodes
I want to get that in XPATH - but its returning null?
document.DocumentNode.SelectNodes("//*[#class='jSC57 _6xe7A']//div//li");
HTML sample:
<div class="isgrP">
<ul class="jSC57 _6xe7A">
<div class>
<li class="wo9IH">
<div class="li-container">
testing
</div>
</li>
</div>
</ul>
</div>

Based on your statement document.getElementsByClassName("jSC57 _6xe7A")[0].firstChild.childNodes, I am making an assumption about what you want to retrieve here. getElementsByClassName("jSC57 _6xe7A") will return a list of ul element. Then, your [0] gets the first ul. firstChild will grab the first div element, and childNodes refers to the li elements that appear under div.
To get the li elements you are trying to retrieve from your tree, I might not use those class names -- they appear to be dynamic and might change each time you load the page.
I would use this XPath instead:
document.DocumentNode.SelectNodes("//ul/div//li");
This will get all of the div nodes that appear under the li element.
But this might be too general for your purposes -- If you want to use the class name, you can modify as such:
document.DocumentNode.SelectNodes("//ul[#class='jSC57 _6xe7A']/div/li");

c# selenium finding element using xpath

I am trying to find an element which is a div inside a div...
here is example of the code:
<div class="col-md-4">
<div style="display: none;" id="multiplier-win" class="label label-success multiplier">2X</div>
<div style="display: block;" id="multiplier-lose" class="label label-danger multiplier">0X</div>
<div style="display: none;" id="multiplier-tie" class="label label-warning multiplier">1X</div>
</div>
I want to find the class="label label-success multiplier" and check if her style="display:none".
How do I write this in c#?
Please help me
thank you!

In your case, the elements have a unique ID. So instead of finding them by class name (which could lead to multiple/inaccurate results), you should use By.Id(...). It is more easy to write by hand than xpath, too.
Let's say your IWebDriver instance is called driver. The code looks like this:
IWebElement element = driver.FindElement(By.Id("multiplier-win"));
String style = element.GetAttribute("style");
...
I don't want to offend you, but you should probably use google before you post here. This is very basic code you will find in multiple tutorials about selenium.
Edit: In case you are looking for multiple elements of a class:
ReadOnlyCollection<IWebElement> elements = driver.FindElements(By.ClassName("..."));
foreach (IWebElement el in elements)
{
...
}

To Find the element:
IWebElement element = driver.FindElement(By.XPath("//div[#class='label label-success multiplier']"));
To check if an element is displayed, this returns a bool (true if displayed, false if not displayed). If you go with philn's element list code, you can throw this line into his foreach statement and it will tell you which ones are displayed.
el.Displayed;

How to find div Element without an id attribute with Selenium Webdriver

I'm using C# and Selenium Webdriver and I'm trying to find a div Element in my html code which looks like this:
<div class="x-grid-cell-inner" style="text-align: left;" unselectable="on">
phys_tag_desc
</div>
I cant find a method to search for the value of the div Element with Selenium Webdriver. I already searched this site and checked the Selenium Webdriver Documentation, but couldn't find anything.

Well if text value is unique, then solution is simple. Try the xpath below:
//div[text()='phys_tag_desc']
If the text is not exact match. Try following:
//div[contains(text(),'phys_tag_desc')]

I have two ways.
Way 2 is more complex but more effective.
Way 1;
You can loop in all divs and look for some equals
Example:
foreach(HtmlElement o in webbrowser.Document.GetElementByTagName("div"))
{
HtmlElement yourElement;
if(o.GetAttribute("class")=="x-grid-cell-inner"&&o.GetAttribute("style")=="text-align: left;")
{
yourElement=o;
break;
}
DoSomethingWith(yourElememt);
}
The other way is follow elements path;
You can find the closer element that has a ID
Example:
<div id="element">
<div>content..</div>
<div>
<div class="x-grid-cell-inner" style="text-align: left;" unselectable="on">
phys_tag_desc
</div>
</div>
</div>
The closest element that have id on this example is
<div id="element">
your element's parent is 2. children of id="element" div
You can get it and follow path like this
yourElement = webbrowser.Document.GetElementById("element").Children[1].Children[0];

In other situation you can use the XPATH Boolean operators.
Try the xpath below:
By.XPath("//div[contains(#class,'*x-grid-cell-inner*') and contains(#unselectable, '*on*') and contains(text(),'*phys_tag_desc*')]")
Bye

Get data from HTML child class

I’m attempting to create a tool, in C#, which gathers and analyses data from a web page/form. There are basically 2 different types of data. Data entered by a user and data created by the system (I don’t have access to).
The data created by the user is kept in fields and the form uses IDs - so GetElementByID is used.
The problem I’m running into is obtaining the data created by the system. It shows on the form, but isn’t associated to an ID. I may be reading/interpreting the HTML incorrectly, but it appears to be a child class (I don’t have much HTML experience). I’m attempting to get the “Date Submitted” data (near the bottom of the code). Sample of the HTML code:
<div class="bottomSpace">
<div class="importfromanotherorder">
<div class="level2Panel" >
<div class="left">
<span id="if error" class="error"></span>
</div>
<div class="right">
Enter Submission ID
<input name="Submission$ID" type="text" id="Submission_ID" class="textbox" />
<input type="submit" name="SumbitButton" value="Import" id="SubmitButton" />
</div>
</div>
</div>
</div>
<div class="bottomSpace">
<div class="detailsinfo">
<div class="level2Panel" >
<div class="left">
<h5>Product ID</h5>
1234567
<h5>Sub ID</h5>
Not available
<h5>Product Type</h5>
Type 1
</div>
<div class="right">
<h5>Order Number</h5>
0987654
<h5>Status</h5>
Ordered
<h5>Date Submitted</h5>
7 17 2012 5 45 09 AM
</div>
</div>
</div>
</div>
Using GetElementsByTagName (searching for “div”) and then using GetAttribute(“className”) (searching for “right”) generates some results, but as there are 2 “right” classes, it’s not working as intended.
I’ve tried searching by className = “detailsinfo”, which I can find, but I’m not sure how I could go about getting down to the “right” class. I tried sibling and children, but the results don't appear to be working. The next possible problem is that it appears the date data is actually text belonging to class “right” and not element “Date Submitted” .
So basically, I'm curious as to how the best approach would be to get the data I'm looking for. Would I need to get all of the class “right” text and then try and extract the date string?
Apologizes if there is too much info or not enough of the required info :) Thanks in advance!
EDIT: Added how GetElementsByTagName is called using C# - per Icarus's comment.
HtmlDocument doc = webBrowser1.Document;
HtmlElementCollection elemColl = doc.GetElementsByTagName("div");

This will do it if the 'right' instance you want is the 2nd. Two approaches given:
The commented-out approach is it's zero based, so uses instance 1.
The second approach is xpath and is therefore one-based so uses instance 2.
private string ReadHTML(string html)
{
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.LoadXml(html);
System.Xml.XmlElement element = doc.DocumentElement;
//This commented-out approach works and might be preferred if you want to iterate
//over a node set instead of choosing just one node
//string key = "//div[#class='right']";
//System.Xml.XmlNodeList setting = element.SelectNodes(key);
//return setting[1].LastChild.InnerText;
// This xpath appraoch will let you select exactly one node:
string key = "((//div[#class='right'])[2])/child::text()[last()]";
System.Xml.XmlNode setting = element.SelectSingleNode(key);
return setting.InnerText;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why is my xpath able to find the element at all? - c#

Related

Scraping from a div

XPath select all child nodes of the first child or an element?

c# selenium finding element using xpath

How to find div Element without an id attribute with Selenium Webdriver

Get data from HTML child class

Categories

Resources