How to access nested div based on class name - c#

I have this html code:
<div class="searchResult webResult">
<div class="resultTitlePane">
Google
</div>
<div class="resultDisplayUrlPane">
www.google.com
</div>
<div class="resultDescription">
Search
</div>
</div>
I want to access innertext inside divs in diffrent variables
I know for accessing a div with a class I hould write
var titles = hd.DocumentNode.SelectNodes("//div[#class='searchResult webResult']");
foreach (HtmlNode node in titles)
{?}
what code should I write to get the innertext of each dive in different variables.TNX

I would extend the current XPath expression you have to match the inner div elements:
//div[#class='searchResult webResult']/div[contains(#class, 'result')]
Then, to get the text, use the .InnerText property:
C# - Get the text inside tags using HTML Agility Pack
C#: HtmlAgilityPack extract inner text

Since you don't know how many nodes will be returned, I suggest using a list:
List<string> titlesStringList = new List<string>();
foreach (HtmlNode node in titles)
{
titlesStringList.Add(node.InnerText);
}

Related

Regular Expression to match only first Paragraph tag in Div

I want to match first empty P tag for each DIV and insert some text. I am using (<p[^>]*>)(</p>) this regular expression which is matching to all P tags inside DIV.
var yourDivString = "<DIV WITH Paragraph Tag(s) and many other tags>";
yourDivString = Regex.Replace(yourDivString , "(<p[^>]*>)(</p>)", "THIS IS FIRST EMPTY P TAG in EACH DIV")
Example:
<div>
<p></p>
<p></p>
</div>
Excepted Output:
<div>
<p>THIS IS FIRST EMPTY P TAG in EACH DIV</p>
<p></p>
</div>
Note: we are not using any HTML files to parse. Its only a few strings.
we can acheive using HTMLAgilityPack.
Code Explanation: Creating a instance of HTMlDocument and load the html string . Selecting the first node from given string and inserting text for paragraph tag with innerHTML. If no need to create or save document, we can directly use OuterHtml to see output.
using HtmlAgilityPack;
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(yourString);
var p = doc.DocumentNode.SelectSingleNode("//p");
p.InnerHtml = "THIS IS FIRST EMPTY P TAG in EACH DIV";
yourString = doc.DocumentNode.OuterHtml;
Console.WriteLine(yourString);

how to get multiple a tags within multiple divs in Html having same names with help of Html agility pack,

<div class="itemMenu level1">
<a class="itemMenuName level1" href="http://www.shophive.com/apple/mac">
<span>MacBook</span>
</a>
// here a more divs in this div which are submenu items and this level1 div ends under them
</div>
I want to get that href link in a tag within that div,
my code is below
foreach (HtmlAgilityPack.HtmlNode node in doc.DocumentNode.SelectNodes("//div[#class='megnor-advanced-menu-popup_inner']"))
{
foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes("./a[#class='itemMenuName level1']"))
{
Console.WriteLine(node2.InnerText + " ");
Console.WriteLine(node2.GetAttributeValue("href", ""));
}
I am getting whole block of code with first loop but second loop is giving me the error of NullReference

How to identify an object in selenium where class name contains a space (like: 'uniqclass xyz') or multiple similar class names exist

Please see the HTML code below
<div class="dgrid-content ui-widget-content" tabindex="0">
<div id="uniqName-36397" class=" dgrid default Item001">
<div id="uniqName-36780" class=" dgrid default Item003">
Use XPath. You can try:
"//div[starts-with(#id, 'uniqName-')]"
This will fetch all the elements whose id attribute value begins with uniqName-.
You can iterate over each element by using the following code:
IList<IWebElement> elements = driver.FindElements(By.XPath("//div[starts-with(#id, 'uniqName-')]"));
foreach(IWebElement element in elements) {
//Do Something
}

How to get href elements and attributes for each node?

I am working on a project that should read html, and find find all nodes that match a value, then find elements and attributes of the located nodes.
I am having difficulty figuring out how to get the href attributes and elements though.
I am using HTMLAgilityPack.
I have numerous nodes of
class="middle"
throughout the html. I need to get all of them, and from them, get the href element and attributes. Below is a sample of the html:
<div class="top">
<div class="left">
<a href="item123">
<img src="url.png" border="0" />
</a>
</div>
</div>
<div class="middle">
<div class="title">Captains Hat</div>
<div class="day">monday</div>
<div class="city">Tuscon, AZ | 100 Days | <script typs="text/javascript">document.write(ts_to_age_min(1445620427));</script></div>
</div>
I have been able to get the other attributes I need, but not for 'href'.
Here is the code I have:
List<string> listResults = new List<string>();
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(url);
//get each listing
foreach (HtmlNode node in doc.DocumentNode.Descendants("div").Where(d =>
d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("middle")))
{
string day = node.SelectSingleNode(".//*[contains(#class,'day')]").InnerHtml;
string city = node.SelectSingleNode(".//*[contains(#class,'city')]").InnerHtml;
string item = node.SelectSingleNode("//a").Attributes["href"].Value;
listResults.Add(day + EnvironmentNewline
+ city + EnvironmentNewline
+ item + EnvironmentNewline + EnvironmentNewline)
}
My code above though is giving me the first href value for the whole html page though, and is giving it for each node for some reason (visible by outputting the list to a messagebox). I thought being in my foreach loop that using SelectSingleNode should get the first href attribute for that specific node. If so, why am I getting the first href attribute for the whole html page loaded?
I've been going through lots of threads on here about getting href values with HTLMAgilityPack, but I haven't been able to get this to work.
How can I get the href attribute and elements for each node I'm selecting based off the class attribute (class="middle")?
Try replacing
string item = node.SelectSingleNode("//a").Attributes["href"].Value;
with
string item = node.SelectSingleNode(".//a").Attributes["href"].Value;
Other than that, code above works for me.
Alternatively:
string item = node.SelectSingleNode(".//*[contains(#class,'title')]")
.Descendants("a").FirstOrDefault().Attributes["href"].Value;

navigate to section of XML with xpath

i am not able to see where i am going wrong with my xpath logic.
here is a section of a larger xml that i am working on transversing. (note im using the Html Agility Pack)
<div>
<div></div>
<span class="pp-headline-item pp-headline-phone">
<span class="telephone" dir="ltr">
<nobr>(732) 562-1312</nobr>
<span class="pp-headline-phone-label" style="display:none">()</span>
</span>‎
</span>
<span> · </span>
<span class="pp-headline-item pp-headline-authority-page">
<span>
<a href="http://maps.google.com/local_url?q=http://www.fed.com/q=07746+pizza">
<span>fed.com</span>
</a>
</span>
</span>
</div>
my goal is to extract various data points from these chunks of xml that i get out of the master XML file by using a
.SelectNodes("//div/span['pp-headline-item pp-headline-phone']/../..")
with this i am expecting to get all the sections outlined above so i can iterate them and extract things like website, phone, address...
problem is when i iterate this nodeset i cant get to the data points i want as if the node set is not the one outlined on top.
my logic is to extract a nodeset from the top most div into the nodset and when iterating them to xpath into the data points i want.
i do it like this:
foreach (HtmlNode n in BuizRowsgoogMaps)
{
//get phone number
if (n.SelectSingleNode("span/nobr").InnerHtml != null)
{
strPhone = n.SelectSingleNode("span/nobr").InnerHtml;
//get phone site
strSite = n.SelectSingleNode("//span['pp-headline-item pp-headline-authority-page']/span/a/span").InnerHtml;
}
}
i suspect my xpaths dont mesh together to get what i want but when i validate my expression i get the desired results... i used this to validate my thinking and it works leaving me at wits end:
//div/span['pp-headline-item pp-headline-phone']/../../span['pp-headline-item pp-headline-phone']/span/nobr
Your code is almost right, you just need to modify your xpath a bit.
foreach (HtmlNode n in BuizRowsgoogMaps)
{
//get phone number
if (n.SelectSingleNode(".//span/nobr").InnerHtml != null)
{
strPhone = n.SelectSingleNode(".//span/nobr").InnerHtml;
//get phone site
strSite = n.SelectSingleNode(".//span['pp-headline-item pp-headline-authority-page']/span/a/span").InnerHtml;
}
}
The .// tells xpath to match from the current node and not from the root.

Categories