Before I start in the source code I have included in this post I have replaced any sensitive data with a string of "x".
<div class="pod productPod icon-life clearFix nonwrap"
id="xxxxxxxxxx"
internalid="xxxxxxxxxx"
data-properties-type="xxxxxxxxxx"
data-properties-code="xxxxxxxxxx"
data-properties-loaded="False"
data-properties-url="xxxxxxxxxx"
data-properties-showmaintenanceerrormessage="False"
data-properties-product-type="xxxxxxxxxx">
<span class="podTitle">Your policies</span>
<div id="1" class="productPodInner" data-qa-productpod="">
<span class="notificationBubble fullProdLarge" aria-hidden="true"></span>
<h2><a data-feedback="" class="clearFix roundelLink" href="#productId0" data-roundel-id="0">
<span class="productIconWrapper">
<span class="productIcon">
</span>
</span>
<span class="productData">
<span class="productName" style="min-height: auto;">Protection</span>
<span class="notificationBubble fullProd"></span>
</span>
</a></h2>
<div class="productDetails podContent" id="#productId0" style="width: 1903px; left: -306.703px;">
<div class="productDetailsInner clearFix">
<h3>xxxxxxxxxx</h3>
<!-- TODO:: Display this in the 2nd column-->
<dl class="initialPolicyContainer detailsList" style="display: none;">
<dt>Policy Number</dt>
<dd class="initialPolicyNumber" data-qa-text="xxxxxxxxxx">xxxxxxxxxx</dd>
</dl>
<div class="clearFix variant3Container" style=""><div class="group-1-2 clearFix">
<div class="column">
<dl class="detailsList">
<dt>Policy number</dt>
<dd class="policyNumberVal">xxxxxxxxxx</dd>
<dt>Term</dt>
<dd>xxxxxxxxxx</dd>
</dl>
So my goal here is to be able to select the top parent using an Xpath locator and then click on that with selenium.
On the website there is a pod with multiple roundel sections each containing its own policy. Normally I can select a roundel using an Xpath query that finds the roundel by the policyType(text) but if there are 2 policies with the same policyType then this Xpath locator will only find the first one.
Roundel has a unique policyNumberVal (line 29)
<dd class="policyNumberVal">xxxxxxxxxx</dd>
That I can locate using this Xpath:
//dd[#class='policyNumberVal' and text() = '{policyNumber}']
What I could like to know is, once I have located the 'policyNumberVal' how do I then step up and select the parent ?
Please let me know if you need any more information than this.
You can use the parent axis,
//dd[#class='policyNumberVal' and text() = '{policyNumber}']/parent::*
its abbreviation,
//dd[#class='policyNumberVal' and text() = '{policyNumber}']/..
or, elevate the predicate and select the parent directly:
//dl[dd[#class='policyNumberVal' and text() = '{policyNumber}']]
Related
Can anyone help me to derive the xpath (from second div the span element label which is GP)
<div class="ohg-patient-banner-suppl-info-section ohg-patient-banner-suppl-info-custom pure-u-1-5">
<div class="ohg-patient-banner-suppl-info-component-container ohg-patient-banner-suppl-info-component-left-bordered ohg-patient-banner-suppl-info-custom" aria-label="Visit Info" role="group"><div class="ohg-patient-banner-suppl-info-section-summary" aria-hidden="false">
</div>
<div class="ohg-patient-banner-suppl-info-section-detail" aria-label="" aria-hidden="true">
<div class="ohg-patient-banner-suppl-info-custom-field">
<span class=" " aria-hidden="true"></span>
<span class=" ohp-metadata-label">Location</span>
<span class="ohg-patient-banner-suppl-info-custom-row-value-icon " aria-hidden="true"></span>
<span class=" ohg-patient-banner-suppl-info-value">Tauranga Hospital - Assmt Plan Unit TAU - </span>
</div>
</div>
</div>
</div>
<div class="ohg-patient-banner-suppl-info-section ohg-patient-banner-suppl-info-custom pure-u-1-5">
<div class="ohg-patient-banner-suppl-info-component-container ohg-patient-banner-suppl-info-component-left-bordered ohg-patient-banner-suppl-info-custom" aria-label="GP Info" role="group"><div class="ohg-patient-banner-suppl-info-section-summary" aria-hidden="false">
</div>
<div class="ohg-patient-banner-suppl-info-section-detail" aria-label="" aria-hidden="true">
<div class="ohg-patient-banner-suppl-info-custom-field">
<span class=" " aria-hidden="true"></span>
<span class=" ohp-metadata-label">GP</span>
<span class="ohg-patient-banner-suppl-info-custom-row-value-icon " aria-hidden="true"></span>
<span class=" ohg-patient-banner-suppl-info-value">-</span>
</div>
</div>
</div>
</div>
My XPath which i wrote it work for the first div and it return the value Location :
.//*[#class='ohg-patient-banner-suppl-info-section-detail']/div[#class='ohg-patient-banner-suppl-info-custom-field']/span[#class=' ohp-metadata-label']
Xpath:
//*[#class='ohg-patient-banner-suppl-info-custom-field']//span[2]
then you can use gettext() to get GP as a text
Try this XPath-1.0 expression:
//*[#class='ohg-patient-banner-suppl-info-section-detail']/div[#class='ohg-patient-banner-suppl-info-custom-field' and span[#class=' ohg-patient-banner-suppl-info-value']='-']/span[#class=' ohp-metadata-label']
Its result is:
GP
programmatic solution:
your xpath actually does return both elements:
in your selenium lib you receive most likely an array and can select the second element of it
select the second element:
if its always the second element adding a [2] to the xpath helps
e.g. "(.//*[#class='ohg-patient-banner-suppl-info-section-detail'])[2]/div[#class='ohg-patient-banner-suppl-info-custom-field']/span[#class=' ohp-metadata-label']"
by a fixed text
If you have some text in the page that is fixed, e.g. Location you can use that as reference and then using ancestor and sibling axes
".//span[.='Location']//ancestor::div[#class='ohg-patient-banner-suppl-info-section-detail']//following-sibling::div[#class='ohg-patient-banner-suppl-info-section-detail']//span[#class=' ohp-metadata-label']"
Since you don't have other unique identifiers, and the class name is used by 2 spans, this is how you can identify that span based on its index, and is the shortest way:
xpath: (//span[#class=' ohp-metadata-label'])[2]
Now you can scrape the text by using selenium getText() method.
Note that we used index 2 to identify your locator, but if html code will change, and new similar spans will be added before this one, you will need to change the index.
i have this html
<div class="form-wrapper">
<div class="clearfix">
<div class="row">
<div class="time-wrapper col-xs-6">
<div class="row">
<div class="text-left col-md-6 cols-sm-12">
<input type="radio" id="flight-return-1" name="flight-return" data-default-meal="X">
<div class="">
<div class="date pad-left-large-md no-padding-left-xs white-space-nowrap">Za. 06 May. 2017</div>
</div>
</div>
<div class="flight date text-right-md text-left-xs col-md-6 cols-sm-12 pad-right-large">
<span>
bet </span>
<span class="time">
12:10 </span>
</div>
</div>
</div>
<div class="time-wrapper col-xs-6">
<div class="row">
<div class="flight date text-md-left text-sm-right no-padding-left col-md-7 cols-sm-12">
<span class="time">
14:25 </span>
<span>
zeb </span>
</div>
<div class="price-wrapper col-md-5 cols-sm-12">
<div class="price text-right white-space-nowrap">
<span class="currency symbol">€</span> <span class="integer-part">69</span><span class="decimal-part">,99</span> </div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
Please note that i have multiples <div class="row">inside one .
i want to get all the data there
i'm using this c# code :
var node_1 = Doc.DocumentNode.SelectNodes("//div[#class='form-wrapper']").First();
var ITEM = node_1.SelectNodes("//div[#class='clearfix']");
foreach (var Node in node_1.SelectNodes("//div[#class='clearfix']"))
{
Console.WriteLine(Node.SelectNodes("//span[#class='time']")[1].InnerText.Trim());
}
I'm only trying to get all the times (there is like 4 class(clearfix) )
so i'm expecting dates like :
14:25
18:25
17:50
13:20
but for some reasons i only get :
14:25
14:25
14:25
14:25
it keeps repeating this i m stuck at this
thanks in advance
The double forward slash in the XPATH of your Console.WriteLine statement ("//span[....") is running out of your current node context and returning the first instance in the whole document that matches your XPATH.
Try to make your second XPATH relative (best way is to debug the code and examine what was returned into the Node variable in the loop)
You could also just iterate the spans directly:
foreach (var spanNode in node_1.SelectNodes("//span[#class='time']"))
{
Console.WriteLine(spanNode.InnerText.Trim());
}
you are passing index statically this will be the issue
Node.SelectNodes("//span[#class='time']")[1].InnerText.Trim()//Here [1] you are passing statically
I need to get a next (sibling) element of the one with "Yes" as its text. I can use the text "Yes", css and part of the id, but the number (e.g.. 106) is unfortunately excluded. Also I can't directly get that sibling, because of that exclusion. Here is a part of the HTML code:
<a style="right: auto;" class="x-btn x-box-item x-toolbar-item" id="button-106">
<span id="button-106-btnWrap" role="presentation" class="x-btn-wrap" unselectable="on">
<span id="button-106-btnEl" class="x-btn-button" role="presentation">
<span id="button-106-btnInnerEl" class="x-btn-inner x-btn-inner-center">Yes
</span>
<span role="presentation" id="button-106-btnIconEl" class="x-btn-icon-el">
</span>
</span>
</span>
</a>
I came up with this query, but it doesn't seem to work:
By.XPath(".//*[text() = 'Yes' and contains(id(), '-btnInnerEl')/following-sibling::*]")
How can I alter this query so I can get the next element?
To select the span with id button-106-btnIconEl:
//span[contains(#id,'-btnInnerEl')][normalize-space(text())='Yes']/following-sibling::span
I'm using HTMLAgilityPack to get text inside a list of certain nodes.
Basically I'm reading out a HTML page with the following HTML:
<div class="video-overview yt-grid-fluid">
<h3 class="video-title-container">
<span class="yt-badge-std">
WATCHED
</span>
<a href="/watch?v=S5FCdx7Dn0o&list=FLArRQZAMoAgECBIOn08gNeA&index=1" title="Bob Marley - Buffalo soldier" class="yt-uix-tile-link yt-uix-sessionlink" data-sessionlink="ei=c9tWUb3eJaOzhgG6kYHYAg&feature=plpp_video">
<span class="title video-title" dir="ltr">Bob Marley - Buffalo soldier</span>
</a>
</h3>
<p class="video-details">
<span class="video-owner">
by <span class="yt-user-name " dir="ltr">Pgroenberg</span>
</span>
<span class="video-view-count">
44,342,136 views
</span>
</p>
</div>
I want to get the text "Bob Marley - Buffalo soldier", which is inside <span class="title video-title" dir">.
I cant seem to find the right pattern:
string expression = #"//span[#class='title video-title' and #dir='ltr']/text()";
HtmlNodeCollection hnc = htmlDoc.DocumentNode.SelectNodes(expression);
hnc will be null because no nodes have matched with the expression. Why wont my expression work?
This question already has answers here:
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 9 years ago.
I am trying to scrape specific html tags including their data from a google products page. I want to get all the <li> tags within this ordered list and put them in a list.
Here is the code:
<td valign="top">
<div id="center_col">
<div id="res">
<div id="ires">
<ol>
<li class="g">
<div class="pslires">
<div class="psliimg">
<a href=
"https://www.google.com">
</a>
</div>
<div class="psliprice">
<div>
<b>$59.99</b> used
</div><cite>google auctions</cite>
</div>
<div class="pslimain">
<h3 class="r"><a href=
"https://www.google.com">
google</a></h3>
<div>
dummy data </div>
</div>
</div>
</li>
<li class="g">
<div class="pslires">
<div class="psliimg">
<a href=
"https://www.google.com">
</a>
</div>
<div class="psliprice">
<div>
<b>$59.99</b> used
</div><cite>google auctions</cite>
</div>
<div class="pslimain">
<h3 class="r"><a href=
"https://www.google.com">
google</a></h3>
<div>
dummy data </div>
</div>
</div>
</li>
<li class="g">
<div class="pslires">
<div class="psliimg">
<a href=
"https://www.google.com">
</a>
</div>
<div class="psliprice">
<div>
<b>$59.99</b> used
</div><cite>google auctions</cite>
</div>
<div class="pslimain">
<h3 class="r"><a href=
"https://www.google.com">
google</a></h3>
<div>
dummy data </div>
</div>
</div>
</li>
<li class="g">
<div class="pslires">
<div class="psliimg">
<a href=
"https://www.google.com">
</a>
</div>
<div class="psliprice">
<div>
<b>$59.99</b> used
</div><cite>google auctions</cite>
</div>
<div class="pslimain">
<h3 class="r"><a href=
"https://www.google.com">
google</a></h3>
<div>
dummy data </div>
</div>
</div>
</li>
</ol>
</div>
</div>
</div>
<div id="foot">
<p class="flc" id="bfl" style="margin:19px 0 0;text-align:center"><a href=
"/support/websearch/bin/answer.py?answer=134479&hl=en">Search Help</a>
<a href=
"/quality_form?q=Pioneer+Automotive+PF-555-2000&hl=en&tbm=shop">Give us
feedback</a></p>
<div class="flc" id="fll" style="margin:19px auto 19px auto;text-align:center">
Google Home <a href=
"/intl/en/ads">Advertising Programs</a> <a href="/services">Business
Solutions</a> Privacy & Terms <a href=
"/intl/en/about.html">About Google</a>
</div>
</div>
</td>
I want to get all the <li class="g"> tags and the data in each of them. Is that possible?
instead of using a regex using something like an xml parser may be more useful to your situation. Load it up into an xml document and then use something like SelectNodes to get out your data you are looking for
http://msdn.microsoft.com/en-us/library/4bektfx9.aspx
I wouldn't use regex for this particular problem.
Instead I would attack it thus:
1)Save off page as html string.
2)Use aforementioned htmlagilitypack or htmltidy(my preference) to convert to XML.
3)Use xDocument to navigate through Dom object by tag and save data.
Trying to create a regex to extract data from a possibly fluid HTML page will break your heart.
Instead of using regex you can use HtmlAgilityPack to parse the HTML.
var doc = new HtmlDocument();
doc.LoadHtml(html);
var listItems = doc.DocumentNode.SelectNodes("//li");
The code above will give you all <li> items in the document. To add them to a list you'll just have to iterate the collection and add each item to the list.