Given the following (generic) dynamic HTML structure:
<ol id="myOrderedList">
<li id="someGuidICantPredict">
<span data-serial="someData1">someData1</span>
<span data-manufacturer="someDataB1">someDataB1</span>
</li>
//(repeated many times with different data)
</ol>
How do I find the following:
Find the <li> where the spans match by CssSelector for both data-serial and data-manufacturer?
I know how to do this for one or the other span tag thusly:
By.CssSelector($"#olCurrentTanks li span[data-serial={serial1}]")
or
By.CssSelector($"#olCurrentTanks li span[data-manufacturer={manufacturer1}]")
But I don't know how to find the parent <li> element where both spans match. Meaning I need to get the IWebElement listItem where the both span's data attributes match the corresponding data which I can predict.
Edit: Difficulty: Okay to use x-path to get the li parent but not to find the spans.
With xpath you can get li element with specific spans children:
//li[./span[#data-serial="someData1"] and ./span[#data-manufacturer="someDataB1"]]
Selector below will give all li elements as a list for FindElements and single first one for FindElement:
By.XPath("//li[./span[#data-serial='someData1'] and ./span[#data-manufacturer='someDataB1']]")
Code examples:
IList<IWebElement> allMyLi = driver.FindElements(By.XPath($"//li[./span[#data-serial='{serial1}'] and ./span[#data-manufacturer='{manufacturer1}']]"));
foreach (var myLi in allMyLi)
{
IWebElement serial = myLi.FindElement(By.CssSelector($"span[data-serial={serial1}]"));
IWebElement manufacturer = myLi.FindElement(By.CssSelector($"span[data-manufacturer={manufacturer1}]"));
Console.WriteLine("serial, manufacturer: {0}, {1}", serial.Text, manufacturer.Text);
}
Related
I have this html code:
<div class="searchResult webResult">
<div class="resultTitlePane">
Google
</div>
<div class="resultDisplayUrlPane">
www.google.com
</div>
<div class="resultDescription">
Search
</div>
</div>
I want to access innertext inside divs in diffrent variables
I know for accessing a div with a class I hould write
var titles = hd.DocumentNode.SelectNodes("//div[#class='searchResult webResult']");
foreach (HtmlNode node in titles)
{?}
what code should I write to get the innertext of each dive in different variables.TNX
I would extend the current XPath expression you have to match the inner div elements:
//div[#class='searchResult webResult']/div[contains(#class, 'result')]
Then, to get the text, use the .InnerText property:
C# - Get the text inside tags using HTML Agility Pack
C#: HtmlAgilityPack extract inner text
Since you don't know how many nodes will be returned, I suggest using a list:
List<string> titlesStringList = new List<string>();
foreach (HtmlNode node in titles)
{
titlesStringList.Add(node.InnerText);
}
Please see the HTML code below
<div class="dgrid-content ui-widget-content" tabindex="0">
<div id="uniqName-36397" class=" dgrid default Item001">
<div id="uniqName-36780" class=" dgrid default Item003">
Use XPath. You can try:
"//div[starts-with(#id, 'uniqName-')]"
This will fetch all the elements whose id attribute value begins with uniqName-.
You can iterate over each element by using the following code:
IList<IWebElement> elements = driver.FindElements(By.XPath("//div[starts-with(#id, 'uniqName-')]"));
foreach(IWebElement element in elements) {
//Do Something
}
I am working on a project that should read html, and find find all nodes that match a value, then find elements and attributes of the located nodes.
I am having difficulty figuring out how to get the href attributes and elements though.
I am using HTMLAgilityPack.
I have numerous nodes of
class="middle"
throughout the html. I need to get all of them, and from them, get the href element and attributes. Below is a sample of the html:
<div class="top">
<div class="left">
<a href="item123">
<img src="url.png" border="0" />
</a>
</div>
</div>
<div class="middle">
<div class="title">Captains Hat</div>
<div class="day">monday</div>
<div class="city">Tuscon, AZ | 100 Days | <script typs="text/javascript">document.write(ts_to_age_min(1445620427));</script></div>
</div>
I have been able to get the other attributes I need, but not for 'href'.
Here is the code I have:
List<string> listResults = new List<string>();
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(url);
//get each listing
foreach (HtmlNode node in doc.DocumentNode.Descendants("div").Where(d =>
d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("middle")))
{
string day = node.SelectSingleNode(".//*[contains(#class,'day')]").InnerHtml;
string city = node.SelectSingleNode(".//*[contains(#class,'city')]").InnerHtml;
string item = node.SelectSingleNode("//a").Attributes["href"].Value;
listResults.Add(day + EnvironmentNewline
+ city + EnvironmentNewline
+ item + EnvironmentNewline + EnvironmentNewline)
}
My code above though is giving me the first href value for the whole html page though, and is giving it for each node for some reason (visible by outputting the list to a messagebox). I thought being in my foreach loop that using SelectSingleNode should get the first href attribute for that specific node. If so, why am I getting the first href attribute for the whole html page loaded?
I've been going through lots of threads on here about getting href values with HTLMAgilityPack, but I haven't been able to get this to work.
How can I get the href attribute and elements for each node I'm selecting based off the class attribute (class="middle")?
Try replacing
string item = node.SelectSingleNode("//a").Attributes["href"].Value;
with
string item = node.SelectSingleNode(".//a").Attributes["href"].Value;
Other than that, code above works for me.
Alternatively:
string item = node.SelectSingleNode(".//*[contains(#class,'title')]")
.Descendants("a").FirstOrDefault().Attributes["href"].Value;
I have lot of trouble with this XPath selction that i use in HtmlAgilityPack.
I want to select all li elements (if they exist) nested in another li witch have a tag with id="menuItem2".
This is html sample:
<div id="menu">
<ul>
<li><a id="menuItem1"></a></li>
<li><a id="menuItem2"></a>
<ul>
<li><a id="menuSubItem1"></a></li>
<li><a id="menuSubItem2"></a></li>
</ul>
</li>
<li><a id="menuItem3"></a></li>
</ul>
</div>
this is XPath that i been using. When i lose this part /ul/li, it gets me the a tag that I wanted, but i need his descendants... This XPath always returns null.
string xpathExp = "//a[#id='" + parentIdHtml + "']/ul/li";
HtmlNodeCollection liNodes = htmlDoc.DocumentNode.SelectNodes(xpathExp);
The following XPath should work.
string xpathExp = "//li/a[#id='" + parentIdHtml + "']/following-sibling::ul/li";
Try this for your xpath:
string xpathExp = "//li[a/#id='" + parentIdHtml + "']/ul/li";
The problem is that you were select the a node itself, which has no ul children. You need to select the li node first, and filter on its a child.
XPath is so messy. You're using the HtmlAgilityPack, you might as well leverage the LINQ.
//find the li -- a *little* complicated with nested Where clauses, but clear enough.
HtmlNode li = htmlDoc.DocumentNode.Descendants("li").Where(n => n.ChildNodes.Where(a => a.Name.Equals("a") && a.Id.Equals("menuItem2", StringComparison.InvariantCultureIgnoreCase)).Count() > 0).FirstOrDefault();
IEnumerable<HtmlNode> liNodes = null;
if (li != null)
{
//Node found, get all the descendent <li>
liNodes = li.Descendants("li");
}
From your description I think you want to select the two <li> elements that contain <a> tags with ids menuSubItem1 and menuSubItem2?
If so then this is what you need
//li[a/#id="menuItem2"]//li
i am not able to see where i am going wrong with my xpath logic.
here is a section of a larger xml that i am working on transversing. (note im using the Html Agility Pack)
<div>
<div></div>
<span class="pp-headline-item pp-headline-phone">
<span class="telephone" dir="ltr">
<nobr>(732) 562-1312</nobr>
<span class="pp-headline-phone-label" style="display:none">()</span>
</span>
</span>
<span> · </span>
<span class="pp-headline-item pp-headline-authority-page">
<span>
<a href="http://maps.google.com/local_url?q=http://www.fed.com/q=07746+pizza">
<span>fed.com</span>
</a>
</span>
</span>
</div>
my goal is to extract various data points from these chunks of xml that i get out of the master XML file by using a
.SelectNodes("//div/span['pp-headline-item pp-headline-phone']/../..")
with this i am expecting to get all the sections outlined above so i can iterate them and extract things like website, phone, address...
problem is when i iterate this nodeset i cant get to the data points i want as if the node set is not the one outlined on top.
my logic is to extract a nodeset from the top most div into the nodset and when iterating them to xpath into the data points i want.
i do it like this:
foreach (HtmlNode n in BuizRowsgoogMaps)
{
//get phone number
if (n.SelectSingleNode("span/nobr").InnerHtml != null)
{
strPhone = n.SelectSingleNode("span/nobr").InnerHtml;
//get phone site
strSite = n.SelectSingleNode("//span['pp-headline-item pp-headline-authority-page']/span/a/span").InnerHtml;
}
}
i suspect my xpaths dont mesh together to get what i want but when i validate my expression i get the desired results... i used this to validate my thinking and it works leaving me at wits end:
//div/span['pp-headline-item pp-headline-phone']/../../span['pp-headline-item pp-headline-phone']/span/nobr
Your code is almost right, you just need to modify your xpath a bit.
foreach (HtmlNode n in BuizRowsgoogMaps)
{
//get phone number
if (n.SelectSingleNode(".//span/nobr").InnerHtml != null)
{
strPhone = n.SelectSingleNode(".//span/nobr").InnerHtml;
//get phone site
strSite = n.SelectSingleNode(".//span['pp-headline-item pp-headline-authority-page']/span/a/span").InnerHtml;
}
}
The .// tells xpath to match from the current node and not from the root.