Can anyone help me to derive the xpath (from second div the span element label which is GP)
<div class="ohg-patient-banner-suppl-info-section ohg-patient-banner-suppl-info-custom pure-u-1-5">
<div class="ohg-patient-banner-suppl-info-component-container ohg-patient-banner-suppl-info-component-left-bordered ohg-patient-banner-suppl-info-custom" aria-label="Visit Info" role="group"><div class="ohg-patient-banner-suppl-info-section-summary" aria-hidden="false">
</div>
<div class="ohg-patient-banner-suppl-info-section-detail" aria-label="" aria-hidden="true">
<div class="ohg-patient-banner-suppl-info-custom-field">
<span class=" " aria-hidden="true"></span>
<span class=" ohp-metadata-label">Location</span>
<span class="ohg-patient-banner-suppl-info-custom-row-value-icon " aria-hidden="true"></span>
<span class=" ohg-patient-banner-suppl-info-value">Tauranga Hospital - Assmt Plan Unit TAU - </span>
</div>
</div>
</div>
</div>
<div class="ohg-patient-banner-suppl-info-section ohg-patient-banner-suppl-info-custom pure-u-1-5">
<div class="ohg-patient-banner-suppl-info-component-container ohg-patient-banner-suppl-info-component-left-bordered ohg-patient-banner-suppl-info-custom" aria-label="GP Info" role="group"><div class="ohg-patient-banner-suppl-info-section-summary" aria-hidden="false">
</div>
<div class="ohg-patient-banner-suppl-info-section-detail" aria-label="" aria-hidden="true">
<div class="ohg-patient-banner-suppl-info-custom-field">
<span class=" " aria-hidden="true"></span>
<span class=" ohp-metadata-label">GP</span>
<span class="ohg-patient-banner-suppl-info-custom-row-value-icon " aria-hidden="true"></span>
<span class=" ohg-patient-banner-suppl-info-value">-</span>
</div>
</div>
</div>
</div>
My XPath which i wrote it work for the first div and it return the value Location :
.//*[#class='ohg-patient-banner-suppl-info-section-detail']/div[#class='ohg-patient-banner-suppl-info-custom-field']/span[#class=' ohp-metadata-label']
Xpath:
//*[#class='ohg-patient-banner-suppl-info-custom-field']//span[2]
then you can use gettext() to get GP as a text
Try this XPath-1.0 expression:
//*[#class='ohg-patient-banner-suppl-info-section-detail']/div[#class='ohg-patient-banner-suppl-info-custom-field' and span[#class=' ohg-patient-banner-suppl-info-value']='-']/span[#class=' ohp-metadata-label']
Its result is:
GP
programmatic solution:
your xpath actually does return both elements:
in your selenium lib you receive most likely an array and can select the second element of it
select the second element:
if its always the second element adding a [2] to the xpath helps
e.g. "(.//*[#class='ohg-patient-banner-suppl-info-section-detail'])[2]/div[#class='ohg-patient-banner-suppl-info-custom-field']/span[#class=' ohp-metadata-label']"
by a fixed text
If you have some text in the page that is fixed, e.g. Location you can use that as reference and then using ancestor and sibling axes
".//span[.='Location']//ancestor::div[#class='ohg-patient-banner-suppl-info-section-detail']//following-sibling::div[#class='ohg-patient-banner-suppl-info-section-detail']//span[#class=' ohp-metadata-label']"
Since you don't have other unique identifiers, and the class name is used by 2 spans, this is how you can identify that span based on its index, and is the shortest way:
xpath: (//span[#class=' ohp-metadata-label'])[2]
Now you can scrape the text by using selenium getText() method.
Note that we used index 2 to identify your locator, but if html code will change, and new similar spans will be added before this one, you will need to change the index.
Related
Before I start in the source code I have included in this post I have replaced any sensitive data with a string of "x".
<div class="pod productPod icon-life clearFix nonwrap"
id="xxxxxxxxxx"
internalid="xxxxxxxxxx"
data-properties-type="xxxxxxxxxx"
data-properties-code="xxxxxxxxxx"
data-properties-loaded="False"
data-properties-url="xxxxxxxxxx"
data-properties-showmaintenanceerrormessage="False"
data-properties-product-type="xxxxxxxxxx">
<span class="podTitle">Your policies</span>
<div id="1" class="productPodInner" data-qa-productpod="">
<span class="notificationBubble fullProdLarge" aria-hidden="true"></span>
<h2><a data-feedback="" class="clearFix roundelLink" href="#productId0" data-roundel-id="0">
<span class="productIconWrapper">
<span class="productIcon">
</span>
</span>
<span class="productData">
<span class="productName" style="min-height: auto;">Protection</span>
<span class="notificationBubble fullProd"></span>
</span>
</a></h2>
<div class="productDetails podContent" id="#productId0" style="width: 1903px; left: -306.703px;">
<div class="productDetailsInner clearFix">
<h3>xxxxxxxxxx</h3>
<!-- TODO:: Display this in the 2nd column-->
<dl class="initialPolicyContainer detailsList" style="display: none;">
<dt>Policy Number</dt>
<dd class="initialPolicyNumber" data-qa-text="xxxxxxxxxx">xxxxxxxxxx</dd>
</dl>
<div class="clearFix variant3Container" style=""><div class="group-1-2 clearFix">
<div class="column">
<dl class="detailsList">
<dt>Policy number</dt>
<dd class="policyNumberVal">xxxxxxxxxx</dd>
<dt>Term</dt>
<dd>xxxxxxxxxx</dd>
</dl>
So my goal here is to be able to select the top parent using an Xpath locator and then click on that with selenium.
On the website there is a pod with multiple roundel sections each containing its own policy. Normally I can select a roundel using an Xpath query that finds the roundel by the policyType(text) but if there are 2 policies with the same policyType then this Xpath locator will only find the first one.
Roundel has a unique policyNumberVal (line 29)
<dd class="policyNumberVal">xxxxxxxxxx</dd>
That I can locate using this Xpath:
//dd[#class='policyNumberVal' and text() = '{policyNumber}']
What I could like to know is, once I have located the 'policyNumberVal' how do I then step up and select the parent ?
Please let me know if you need any more information than this.
You can use the parent axis,
//dd[#class='policyNumberVal' and text() = '{policyNumber}']/parent::*
its abbreviation,
//dd[#class='policyNumberVal' and text() = '{policyNumber}']/..
or, elevate the predicate and select the parent directly:
//dl[dd[#class='policyNumberVal' and text() = '{policyNumber}']]
i have this html
<div class="form-wrapper">
<div class="clearfix">
<div class="row">
<div class="time-wrapper col-xs-6">
<div class="row">
<div class="text-left col-md-6 cols-sm-12">
<input type="radio" id="flight-return-1" name="flight-return" data-default-meal="X">
<div class="">
<div class="date pad-left-large-md no-padding-left-xs white-space-nowrap">Za. 06 May. 2017</div>
</div>
</div>
<div class="flight date text-right-md text-left-xs col-md-6 cols-sm-12 pad-right-large">
<span>
bet </span>
<span class="time">
12:10 </span>
</div>
</div>
</div>
<div class="time-wrapper col-xs-6">
<div class="row">
<div class="flight date text-md-left text-sm-right no-padding-left col-md-7 cols-sm-12">
<span class="time">
14:25 </span>
<span>
zeb </span>
</div>
<div class="price-wrapper col-md-5 cols-sm-12">
<div class="price text-right white-space-nowrap">
<span class="currency symbol">€</span> <span class="integer-part">69</span><span class="decimal-part">,99</span> </div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
Please note that i have multiples <div class="row">inside one .
i want to get all the data there
i'm using this c# code :
var node_1 = Doc.DocumentNode.SelectNodes("//div[#class='form-wrapper']").First();
var ITEM = node_1.SelectNodes("//div[#class='clearfix']");
foreach (var Node in node_1.SelectNodes("//div[#class='clearfix']"))
{
Console.WriteLine(Node.SelectNodes("//span[#class='time']")[1].InnerText.Trim());
}
I'm only trying to get all the times (there is like 4 class(clearfix) )
so i'm expecting dates like :
14:25
18:25
17:50
13:20
but for some reasons i only get :
14:25
14:25
14:25
14:25
it keeps repeating this i m stuck at this
thanks in advance
The double forward slash in the XPATH of your Console.WriteLine statement ("//span[....") is running out of your current node context and returning the first instance in the whole document that matches your XPATH.
Try to make your second XPATH relative (best way is to debug the code and examine what was returned into the Node variable in the loop)
You could also just iterate the spans directly:
foreach (var spanNode in node_1.SelectNodes("//span[#class='time']"))
{
Console.WriteLine(spanNode.InnerText.Trim());
}
you are passing index statically this will be the issue
Node.SelectNodes("//span[#class='time']")[1].InnerText.Trim()//Here [1] you are passing statically
I need to get a next (sibling) element of the one with "Yes" as its text. I can use the text "Yes", css and part of the id, but the number (e.g.. 106) is unfortunately excluded. Also I can't directly get that sibling, because of that exclusion. Here is a part of the HTML code:
<a style="right: auto;" class="x-btn x-box-item x-toolbar-item" id="button-106">
<span id="button-106-btnWrap" role="presentation" class="x-btn-wrap" unselectable="on">
<span id="button-106-btnEl" class="x-btn-button" role="presentation">
<span id="button-106-btnInnerEl" class="x-btn-inner x-btn-inner-center">Yes
</span>
<span role="presentation" id="button-106-btnIconEl" class="x-btn-icon-el">
</span>
</span>
</span>
</a>
I came up with this query, but it doesn't seem to work:
By.XPath(".//*[text() = 'Yes' and contains(id(), '-btnInnerEl')/following-sibling::*]")
How can I alter this query so I can get the next element?
To select the span with id button-106-btnIconEl:
//span[contains(#id,'-btnInnerEl')][normalize-space(text())='Yes']/following-sibling::span
I'm using HTMLAgilityPack to get text inside a list of certain nodes.
Basically I'm reading out a HTML page with the following HTML:
<div class="video-overview yt-grid-fluid">
<h3 class="video-title-container">
<span class="yt-badge-std">
WATCHED
</span>
<a href="/watch?v=S5FCdx7Dn0o&list=FLArRQZAMoAgECBIOn08gNeA&index=1" title="Bob Marley - Buffalo soldier" class="yt-uix-tile-link yt-uix-sessionlink" data-sessionlink="ei=c9tWUb3eJaOzhgG6kYHYAg&feature=plpp_video">
<span class="title video-title" dir="ltr">Bob Marley - Buffalo soldier</span>
</a>
</h3>
<p class="video-details">
<span class="video-owner">
by <span class="yt-user-name " dir="ltr">Pgroenberg</span>
</span>
<span class="video-view-count">
44,342,136 views
</span>
</p>
</div>
I want to get the text "Bob Marley - Buffalo soldier", which is inside <span class="title video-title" dir">.
I cant seem to find the right pattern:
string expression = #"//span[#class='title video-title' and #dir='ltr']/text()";
HtmlNodeCollection hnc = htmlDoc.DocumentNode.SelectNodes(expression);
hnc will be null because no nodes have matched with the expression. Why wont my expression work?
I'm trying to build a class that will read, group and sort an html document based on another web site.
I will display the things I have up to now. Here's a sample of how the web page is constructed (keep in mind that it is just "how" it is built, I've rewrote the whole thing):
<tr>
<td id="ab100_ab100_ab100_Main_Sub_Sub_objComponent" class="compContainer">
<table class="objDetails" style="position: relative; margin: auto;">
<tr>
<div class="smallSetup" style="margin-top: 10px;">
<b class="ft"><b></b></b>
<div id="ab100_ab100_ab100_Main_Sub_Sub_firstProp" class="row">
<div class="label">
First Name:</div>
<div class="value">
Albert Trebla</div>
</div>
<div id="ab100_ab100_ab100_Main_Sub_Sub_secondProp" class="row">
<div class="label" style="line-height:25px;">
Second Year:</div>
<div class="value">
<img src="/Setup/Images.ashx?size=medium&name=5&type=symbol" alt="5" align="absbottom" /><img src="/Setup/Images.ashx?size=medium&name=W&type=symbol" alt="Second" align="absbottom" />
</div>
<div id="ab100_ab100_ab100_Main_Sub_Sub_thirdProp" class="row" style="height:15px; position:relative;">
<div class="label" style="font-size:.7em;">
Classy Stuff:</div>
<div class="value">
7<br /><br /></div>
</div>
<div id="ab100_ab100_ab100_Main_Sub_Sub_fourthProp" class="row">
<div class="label">
Weather:</div>
<div class="value">
Cloudy — Might Rain</div>
</div>
<div id="ab100_ab100_ab100_Main_Sub_Sub_fifthProp" class="row">
<div class="label">
Front Text:</div>
<div class="value">
<div class="frontTextBox">Opened</div><div class="frontTextBox">The shop is opened when the bridges are lowered.</div></div>
</div>
<div id="ab100_ab100_ab100_Main_Sub_Sub_sixthProp" class="row">
<div class="label">
Flavor:</div>
<div id="ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_FlavorText" class="value">
<div class="frontTextBox"><i>"This taste good!"</i></div></div>
</div>
And so on.
Now here's how I structured my code in my app:
HtmlWeb loader = new HtmlWeb();
HtmlDocument doc = loader.Load(stringUrl);
HtmlNode parentNode = doc.GetElementById(ab100_ab100_ab100_Main_Sub_Sub_objComponent);
HtmlNodeCollection allNodes = parentNode.SelectNodes(".//div[#class='row']");
And I have my collection of divs, but I'm unable to make the next step. The first thing to understand is that the layout of the html code up there will change, so sometimes the firstProp will not show and sometimes it's the sixth prop, and so on.
So I though to check if the node's attributes is "label":
foreach (HtmlNode htmlNode in allNodes)
{
if (htmlNode.Attributes["class"].Value == "label")
{
}
}
But I don't know how to check the value after since the next sibling is an empty div. And I don't know much how HtmlAgilityPack work, so I wonder if there is an easier way to get this.
Can anyone advise me on how to proceed, or if what I'm doing is wrong and how to correct it?
* EDIT *
I have changed the line:
HtmlNodeCollection allNodes = parentNode.SelectNodes(".//div[#class='row']");
so that now my collection is narrowed only to the div I would get. But I still need to read when I get a div with class "label", read what value it is (ex: Front Text), and if that's Front Text, get the following div with class "value".
I suggest you learn a bit of XPATH which is supported by the Html Agility Pack, and allows for concise queries over the HTML DOM. For example, the following code:
HtmlDocument doc = new HtmlDocument();
doc.Load("test.htm");
HtmlNode node = doc.GetElementbyId("ab100_ab100_ab100_Main_Sub_Sub_objComponent");
foreach (HtmlNode row in node.SelectNodes(".//div[#class='row']"))
{
Console.Write(row.SelectSingleNode("div[#class='label']").InnerText.Trim());
Console.WriteLine(row.SelectSingleNode("div[#class='value']").InnerText.Trim());
}
Will output this:
First Name:Albert Trebla
Second Year:
Classy Stuff:7
Weather:Cloudy - Might Rain
Front Text:OpenedThe shop is opened when the bridges are lowered.
Flavor:"This taste good!"
if you need HTML inside the value or label div, then you can again issue XPATH queries from there.