HtmlAgilityPack adding nodes to other nodes - c#

I am using HtmlAgilityPack to parse my html. My main goal is add some div elements around table row elements. However there are also some more elements in the table row. I only want to wrap it with div elements when the font's style is a background color of lightgreen.
Here's an example of what my html looks like:
<tr>
<td style="padding-left: 40pt;"><font style="background-color: lightgreen" color="black">Tove</font></td>
<td style="padding-left: 40pt;"><font style="background-color: lightgreen" color="black">To</font></td>
</tr>
however my current code:
HtmlNode[] nodes = document.DocumentNode.SelectNodes("//tr[//td[//font[#style='background-color: lightgreen']]]").ToArray();
foreach (HtmlNode node in nodes)
{
node.InnerHtml = node.InnerHtml.Replace( node.InnerHtml,"<div class=\"select-me\">" +node.InnerHtml + "</div>");
}
produces this html:
<tr>
<div class="select-me">
<td style="padding-left: 25pt;"><font style="background-color: white" color="black"><to</font><font style="background-color: white" color="black">></font></td>
<td style="padding-left: 25pt;"><font style="background-color: white" color="black"><to</font><font style="background-color: white" color="black">></font></td>
</div>
</tr>
<tr>
<div class="select-me">
<td style="padding-left: 40pt;"><font style="background-color: lightgreen" color="black">Tove</font></td>
<td style="padding-left: 40pt;"><font style="background-color: lightgreen" color="black">To</font></td>
</div>
</tr>
<tr>
<div class="select-me">
<td style="padding-left: 25pt;"><font style="background-color: white" color="black"></to></font></td>
<td style="padding-left: 25pt;"><font style="background-color: white" color="black"></to></font></td>
</div>
</tr>
the div elements don't surround the tr elements and every tr has a div element, when only a tr element that has font element of background color lightgreen should contain a div element. If you look at the first tr element it has 4 font elements, instead of two. Ideally, my goal is to insert div elements around a tr element when its font's element background is lightgreen. I have looked at other posts, but I am still having trouble.
Correct html should like:
<div class="select-me">
<tr>
<td style="padding-left: 25pt;"><font style="background-color: lightgreen" color="black">hello</font></td>
<td style="padding-left: 25pt;"><font style="background-color: lightgreen" color="black">goodbye</font></td>
</tr>
</div>

for (var i = 0; i < nodes.Length; i++ )
{
var node = nodes[i];
node = HtmlNode.CreateNode(node.OuterHtml.Replace(node.OuterHtml, "<div class=\"select-me\">" + node.OuterHtml + "</div>"));
}

Related

C# selenium find an element by text then click on checkbox

I want to search for example (UK) then click on checkbox that is in the same row
HTML (but tr can increase or decrease)
<tr class="ng-scope table-row-style">
<td class="ng-binding">US</td>
<td class="ng-binding">United States</td>
<td class="btn-td" style="padding: 0;">
<input type="checkbox" class="ng-pristine ng-untouched ng-valid ng-empty">
</td>
</tr>
<tr class="ng-scope table-row-style">
<td class="ng-binding">UK</td>
<td class="ng-binding">United Kingdom</td>
<td class="btn-td" style="padding: 0;">
<input type="checkbox" class="ng-pristine ng-untouched ng-valid ng-empty">
</td>
</tr>
<tr class="ng-scope table-row-style">
<td class="ng-binding">IN</td>
<td class="ng-binding">India</td>
<td class="btn-td" style="padding: 0;">
<input type="checkbox" class="ng-pristine ng-untouched ng-valid ng-empty">
</td>
</tr>
solved by
IList<IWebElement> tableRow = tableElement.FindElements(By.TagName("tr"));
IList<IWebElement> rowTD;
if (tableRow.Count > 0)
{
foreach (IWebElement row in tableRow)
{
rowTD = row.FindElements(By.TagName("td"));
if (rowTD[0].Text.Equals("UK"))
rowTD[2].Click();
}
}
You can find the parent tr element based on child td containing the desired text and then find the child input element as following:
//tr[.//td[contains(.,'UK')]]//input
The entire Selenium command finding and clicking this element could be
driver.FindElement(By.XPath("//tr[.//td[contains(.,'UK')]]//input")).Click();

No text returned for IWebElement.Text

I have some xpath which correctly returns me the nodes I am after but I cannot retrieve the text between the tags in Selenium.
The below correctly returns the 4 nodes I am after:
var subMenuItems = driver.FindElements(By.XPath("//div[#id='OpenandApply123']//a"));
However when I do the below I get nothing for text returned when I'm expecting "abc123":
string item1 = subMenuItems[0].Text;
How do I get the texts for abc123, def123 etc, returned?
Full html below, apologies for formatting:
<div class="subMenu" id="OpenandApply123" role="menu" aria-hidden="false" style="" xpath="1">
<table cellpadding="0" cellspacing="0" width="150" role="presentation">
<tbody><tr role="presentation">
<td role="presentation" class="">
<a id="abc.feature_link" href="abc/abc" class="menuItem" target="_top" style="background-position: 4px 2px;">
abc123
</a>
</td>
</tr>
<tr role="presentation">
<td role="presentation" class="">
<a id="def.feature_link" href="abc/Feature/def" class="menuItem" target="_top" style="background-position: 4px 2px;">
def123
</a>
</td>
</tr>
<tr role="presentation">
<td role="presentation">
<a id="ghi.feature_link" href="abc/Feature/ghi" class="menuItem" target="_top" style="">
ghi123
</a>
</td>
</tr>
<tr role="presentation">
<td role="presentation">
<a id="klm.feature_link" href="abc/klm" class="menuItem" target="_top" style="">
klm123
</a>
</td>
</tr>
</tbody></table>
</div>
I would suggest induce WebDriverWait and wait for VisibilityOfAllElementsLocatedBy() and following xpath.
WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(10));
wait.Until(SeleniumExtras.WaitHelpers.ExpectedConditions.VisibilityOfAllElementsLocatedBy(By.XPath("//td[#role='presentation']//a")));
var subMenuItems = driver.FindElements(By.XPath("//td[#role='presentation']//a"));
string item1 = subMenuItems[0].Text;
Or
string item1 = subMenuItems[0].GetAttribute("textContent");

Add tbody XML Element to table Element in XDcoument

Want to add <tbody> element in <table> elements if missing on Xdcoument.
<table class="newtable" id="item_559_Table1" cellpadding="0" cellspacing="0" data-its-style="width:11.4624em; border-spacing:0;">
<colgroup data-its-style="width:11.4624em; " />
<tr>
<td data-its-style="padding:0.2292em; vertical-align:top; ">
<p data-its-style="">My dad cooks up a pot of chicken soup, and</p>
</td>
</tr>
<tr>
<td data-its-style="padding:0.2292em; vertical-align:top; ">
<p data-its-style="font-weight:normal; ">This cold means I can’t taste a thing today!</p>
</td>
</tr>
</table>
Output should look like
<table class="newtable" id="item_559_Table1" cellpadding="0" cellspacing="0" data-its-style="width:11.4624em; border-spacing:0;">
<colgroup data-its-style="width:11.4624em; " />
<tbody>
<tr>
<td data-its-style="padding:0.2292em; vertical-align:top; ">
<p data-its-style="">My dad cooks up a pot of chicken soup, and</p>
</td>
</tr>
<tr>
<td data-its-style="padding:0.2292em; vertical-align:top; ">
<p data-its-style="font-weight:normal; ">This cold means I can’t taste a thing today!</p>
</td>
</tr>
</tbody>
</table>
**Not looking for XSLT solution.
One way to do it would be to grab the children of <table>, then add them back they way you want them.
var doc = XDocument.Load("file.xml");
var colgroup = doc.Root.Elements("colgroup");
var tr = doc.Root.Elements("tr");
// Add tr to tbody
var tbody = new XElement("tbody", tr);
// Replace the children of table with colgroup and tbody
doc.Root.ReplaceNodes(colgroup, tbody);

Failing to retrieve the third td nodes in an html list

I am trying to get the text "Very Good Country views" and "Good" using HTMLAgilityPack.
<div class="property-details-section">
<h5><span id="content_lblFurtherDetails">Further Details</span></h5>
<ul id="features">
<li style="display:block;">
<table border="0" cellpadding="0" cellspacing="0" width="500">
<tr>
<td style="width: 15px;">
<img src="../images/bullet.png" alt="bullet" />
</td>
<td style="width: 185px;">Views</td>
<td style="width: 300px;">Very Good Country views</td>
</tr>
</table>
</li>
</ul>
<li style="display:block;">
<table border="0" cellpadding="0" cellspacing="0" width="500">
<tr>
<td style="width: 15px;">
<img src="../images/bullet.png" alt="bullet" />
</td>
<td style="width: 185px;">Finish</td>
<td style="width: 300px;">Good</td>
<tr>
</table>
</li>
</div>
I have tried the following for "Very Good Country views" with no success:
HtmlNode text =
doc.DocumentNode.SelectSingleNode("//ul[#id='features']/li/table/tr/td[3]");
I am trying to get the text "Very Good Country views" and "Good"
You have to select 2 elements, so you should use SelectNodes instead of SelectSingleNode, if you want get the result at once.
var result = doc.DocumentNode.SelectNodes("//ul[#id='features']/li/*//td[last()]")
.Select(td => td.InnerText)
.ToList();
I think the problem about your XPath is that you should add brackets around the expression:
var text = doc.DocumentNode
.SelectSingleNode("(//ul[#id='features']/li/table/tr/td)[3]");
You can also try using LINQ:
var td = doc.Descendants("ul")
.First(x => x.GetAttributeValue("id","") == "features")
.Descendants("td")
.Skip(2)
.First();
var text = td.InnerText;

Why my code is selecting all text() nodes in Htmldocument

HtmlNode node = doc.DocumentNode.SelectNodes("//tr")[0];
foreach(HtmlTextNode n in node.SelectNodes("//text()"))
Console.WriteLine(n.Text);
HTML:
<table class="infobox" style="width: 17em; font-size: 100%;float: left;">
<tr>
<th style="text-align: center; background: #f08080;" colspan="3">خدیجہ مستور</th>
</tr>
<tr style="text-align: center;">
<td colspan="3"><img alt="خدیجہ مستور" src="//upload.wikimedia.org/wikipedia/ur/thumb/7/7b/Khatijamastoor.JPG/150px-Khatijamastoor.JPG" width="150" height="203" srcset="//upload.wikimedia.org/wikipedia/ur/thumb/7/7b/Khatijamastoor.JPG/225px-Khatijamastoor.JPG 1.5x, //upload.wikimedia.org/wikipedia/ur/thumb/7/7b/Khatijamastoor.JPG/300px-Khatijamastoor.JPG 2x"><br>
<div style="font-size: 90%">خدیجہ مستور</div>
</td>
</tr>
<tr>
<th style="background: #f08080;" colspan="3">ادیب</th>
</tr>
<tr>
<td><b>ولادت</b></td>
<td colspan="2">1930ء، لکھنؤ، برطانوی ہندوستان</td>
</tr>
<tr>
<td><b>اصناف ادب</b></td>
<td colspan="2">ناول</td>
</tr>
<tr>
<td><b>معروف تصانیف</b></td>
<td colspan="2">آنگن</td>
</tr>
</table>
Output Should be :
خدیجہ مستور
but i found :
خدیجہ مستور
خدیجہ مستور
ادیب
ولادت
1930ء
،
لکھنؤ
،
برطانوی ہندوستان
اصناف ادب
ناول
معروف تصانیف
آنگن
Why node.selectNodes("//text()") is selecting all text() nodes in document rather text() nodes from just first tr tag??
Because you are adding two forward slashes to the beginning of your XPath (//tr), which selects all of the elements in the document, not just descendants of the selected node.
Try this instead:
foreach (HtmlTextNode n in node.SelectNodes("text()"))
Or just simplify the XPath to:
var node = doc.DocumentNode.SelectSingleNode("//tr[1]/text()");
Console.WriteLine(node.Text);

Categories