I have the following HTML:
<tbody>
<tr>
<td class="metadata_name">Headquarters</td>
<td class="metadata_content">Princeton New Jersey, United States</td>
</tr>
<tr>
<td class="metadata_name">Industry</td>
<td class="metadata_content"><ul><li>Engineering Software</li><li>Software Development & Design</li><li>Software</li><li>Custom Software & Technical Consulting</li></ul></td>
</tr>
<tr>
<td class="metadata_name">Revenue</td>
<td class="metadata_content">$17.5 Million</td>
</tr>
<tr>
<td class="metadata_name">Employees</td>
<td class="metadata_content">201 to 500</td>
</tr>
<tr>
<td class="metadata_name">Links</td>
<td class="metadata_content"><ul><li>Company website</li></ul></td>
</tr>
</tbody>
I want to be able to load the metadata_content value (ex "$17.5 Million") in to a var where the metadata_name is = to a value (ex: "Revenue").
I have tried to use combinations of code like this for a few hours...
orgHtml.DocumentNode.SelectNodes("//td[#class='metadata_name']")[0].InnerHtml;
But I'm not getting the right combination down. If you have a helpful SelectNodes syntax - that will get me the solution I would appreciate it.
It seems what you're looking for is this:
var found = orgHtml.DocumentNode.SelectSingleNode(
"//tr[td[#class = 'metadata_name'] = 'Revenue']/td[#class = 'metadata_content']");
if (found != null)
{
string html = found.InnerHtml;
// use html
}
Note that to get the text of an element, you should use found.InnerText, not found.InnerHtml, unless you specifically need its HTML content.
Related
I'm trying to get following data.
<html>
<body>
<tr class="udline">
<th rowspan="2" class="noln">시간</th>
<th rowspan="2">개인</th>
<th rowspan="2">외국인</th>
<th rowspan="2">기관계</th>
<th colspan="6" class="eb">기관</th>
<th rowspan="2">기타법인</th>
</tr>
<tr class="udline">
<th class="sub">금융투자</th>
<th class="sub">보험</th>
<th class="sub">투신<br>(사모)</th>
<th class="sub">은행</th>
<th class="sub">기타금융기관</th>
<th class="sub">연기금등</th>
</tr>
<tr>
<td colspan="11" class="blank_07"></td>
</tr>
<!-- following are data -->
<tr>
<td class="date2">18:01</td>
<td class="rate_up3">2,024</td>
<td class="rate_down3">-3,307</td>
<td class="rate_up3">1,116</td>
<td class="rate_up3">824</td>
<td class="rate_down3">-16</td>
<td class="rate_up3">764</td>
<td class="rate_down3">-43</td>
<td class="rate_down3">-5</td>
<td class="rate_down3">-408</td>
<td class="rate_up3">166</td>
</tr>
<tr>
<td class="date2">18:00</td>
<td class="rate_up3">2,022</td>
<td class="rate_down3">-3,305</td>
<td class="rate_up3">1,116</td>
<td class="rate_up3">824</td>
<td class="rate_down3">-16</td>
<td class="rate_up3">764</td>
<td class="rate_down3">-43</td>
<td class="rate_down3">-5</td>
<td class="rate_down3">-408</td>
<td class="rate_up3">166</td>
</tr>
...
</body></html>
I want to get Nodes list of "tr" tag which has a data. but I have problem with getting "tr" tag.
I think it is enough if I can get sets of "tr" which has 11 td tags.
so I write following source.
result = await httpClient.GetStringAsync(new Uri(timeUrlAddress));
htmlDoc.LoadHtml(result);
var nodes =
htmlDoc.DocumentNode.SelectNodes("//tr")
.Where(i => i.ChildNodes.Any(j => j.Name.Equals("td")).Count>10); // <--- I have Problem.
foreach(var i in nodes) { ... } // <-- iterating list of <tr> tags.
and It doesn't work.
I could get List of tr tag with DoucmentNode.SelectNodes("//tr") ... and I appended .Where(i=>i.ChildNodes.Count >10 ) to get what i want.
but tr has several "text"childNodes and I get Unwanted Node. following picture shows that I got with .Where(i=>i.ChildNodes.Count>10).
I want to get tr node that has td tag as child nodes and has exactly 11 of td tag.
how can I get that tr nodes with Linq syntax..?
If you want tr node with exactrly 11 td children you can use below XPath:
//tr[count(td) = 11]
I have some html code
<tfoot>
<tr>
<th class="first"> </th>
<td class="first-col">14</td>
<td class="">15</td>
<td class="">16</td>
<td class="">17</td>
<td class="">18</td>
<td class="">19</td>
<td class="">20</td>
<td class="last-col">21</td>
</tr>
</tfoot>
and i need to select text from first <td> (14). I use
HtmlAgilityPack and code like that:
_footer = _htmlDocument.DocumentNode.SelectSingleNode("//tfoot/tr/th[#class='first']/td[1]");
return _footer.First().InnerText;
It return me nothing. What I'm doing wrong?
td is not child element of th. They are at same level. You should select td as direct child of tr. And you can specify it's class first-col instead of using index:
//tfoot/tr/td[#class='first-col']
Also don't use First() thus you are selecting single node:
return _footer.InnerText;
NOTE: As Jon pointed, you can still use your code to select cell by index instead of using it's class:
//tfoot/tr/td[0]
I have the following code/markup
#for (int m = 0; m < Model.Parts.Count; m++ ) {
var item2 = Model.Parts[m];
<tr id='#item2.WorkOrderPartId'>
<td></td>
<td style='text-align:center'> #item2.LineNo </td>
<td> #item2.SalesOrderLineNo</td>
<td style='text-align:center'>
#item2.Length
</td>
<td style='width:100px;text-align:right'>
</td>
<td style='width:100px;text-align:left'>
</td>
</td>
<td style='padding-right:10px'>0</td>
<td style='padding-right:10px'>0</td>
<td class='RemainingWeight'></td>
<td> </td>
</tr>
}
When this view get executed following error is getting OnException.
Encountered end tag "tr" with no matching start tag. Are your
start/end tags properly balanced?
Please helpme? :(
Good indentation is the key to success.
#for (int m = 0; m < Model.Parts.Count; m++ ) {
var item2 = Model.Parts[m];
<tr id='#item2.WorkOrderPartId'>
<td></td>
<td style='text-align:center'> #item2.LineNo </td>
<td> #item2.SalesOrderLineNo</td>
<td style='text-align:center'>
#item2.Length
</td>
<td style='width:100px;text-align:right'></td>
<td style='width:100px;text-align:left'></td>
<td style='padding-right:10px'>0</td>
<td style='padding-right:10px'>0</td>
<td class='RemainingWeight'></td>
<td></td>
</tr>
}
You had
<td style='width:100px;text-align:left'>
</td>
</td>
Which should just be
<td style='width:100px;text-align:left'></td>
There is an additional end tag without a start tag for the same in the code you have shared. The one marked in ** below from your code snippet.
<td style='width:100px;text-align:left'>
</td>
**</td>**
In case of this code block only You have one extra td end tag that is:
<td style='width:100px;text-align:left'>
</td>
</td>
... Athough to detect in future If you are using visual studio then click on the starting tag of a it will highlight the corresponding end Tag. If it doesn't highlight an end tag for one of the starting tag then that one is missing the end tag.
I am not sure is this solve your problem.
Anyway you can just try this code.
#for (int m = 0; m < Model.Parts.Count; m++)
{
var item2 = Model.Parts[m];
<tr id='#item2.WorkOrderPartId'>
<td></td>
<td style='text-align:center'> #item2.LineNo </td>
<td> #item2.SalesOrderLineNo</td>
<td style='text-align:center'>
#item2.Length
</td>
<td style='width:100px;text-align:right'></td>
<td style='width:100px;text-align:left'></td>
<td style='padding-right:10px'>0</td>
<td style='padding-right:10px'>0</td>
<td class='RemainingWeight'></td>
<td> </td>
</tr>
}
There was an extra closing td tag in your code.
How to read <table> into onmouseover event with C# and HTMLAgilityPack?
markup code :
<a href="#" class="chan_live_not_free" onclick="return false;" onmouseover="return overlib('
<table>
<tr class=fieldRow>
<td class=posH_col width=40>
<strong>pos</strong>
</td>
<td class=rest_col width=90>
<strong>satellite</strong>
</td>
<td class=freqH_col width=50>
<strong>freq</strong>
</td>
<td class=rest_col width=90>
<strong>symbol</strong>
</td>
<td class=rest_col width=90>
<strong>encryption</strong>
</td>
</tr>
<tr>
<td class="pos_col">39.0°e</td>
<td class=rest_col>Hellas Sat 2</td>
<td class="freq_col">12.606 H</td>
<td class=rest_col>30000 - 2/3</td>
<td class=enc_not_live>MPEG-4 BulCrypt</td>
</tr>
</table>',CAPTION, 'Arena Sport 4 (serbia) – 19/10/14 - 11:30');" onmouseout="return nd();">
Arena Sport 4 (serbia)
</a>
I need to read the table into onmouseover event. How does it read?
You could get the element attribute of the <a> tag with HTML Agility Pack and then using regular expressions get the <table> inside the string, something like the following code :
var html = #"<a href='#' class='chan_live_not_free' onclick='return false;' onmouseover='return overlib(
<table>
<tr class=fieldRow>
<td class=posH_col width=40>
<strong>pos</strong>
</td>
<td class=rest_col width=90>
<strong>satellite</strong>
.
.
.
<tr>
<td class="pos_col">39.0°e</td>
<td class=rest_col>Hellas Sat 2</td>
<td class="freq_col">12.606 H</td>
<td class=rest_col>30000 - 2/3</td>
<td class=enc_not_live>MPEG-4 BulCrypt</td>
</tr>
</table>,CAPTION, 'Arena Sport 4 (serbia) – 19/10/14 - 11:30');' onmouseout='return nd();'>
Arena Sport 4 (serbia)
</a>";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var value = doc.DocumentNode.SelectSingleNode("//a[#class='chan_live_not_free']").Attributes["onmouseover"].Value;
var text = Regex.Matches(value, #"<table>([^)]*)</table>")[0].Value;
I have this piece of html code. I want to get the text inside the <div> tag using WatiN. The C# code is below, but I'm pretty sure it could be done way better than my solution. Any suggestions?
HTML:
<table id="someId" cellspacing="0" border="1" style="border-collapse:collapse;" rules="all">
<tbody>
<tr>
<th scope="col"> </th>
</tr>
<tr>
<td>
<div>Some text</div>
</td>
</tr>
</tbody>
</table>
C#
// Get the table ElementContainer
IElementContainer diagnosisElementContainer = (IElementContainer)_control.GetElementById("someId");
// Get the tbody element
IElementContainer tbodyElementContainer = (IElementContainer)diagnosisElementContainer.ChildrenWithTag("tbody");
// Get the <tr> children
ElementCollection trElementContainer = tbodyElementContainer.ChildrenWithTag("tr");
// Get the <td> child of the last <tr>
IElementContainer tdElementContainer = (IElementContainer)trElementContainer.ElementAt<Element>(trElementContainer.Count - 1);
// Get the <div> element inside the <td>
Element divElement = tdElementContainer.Divs[0];
Based on the given, something like this is how I'd go for IE.
IE myIE = new IE();
myIE.GoTo("[theurl]");
string theText = myIE.Table("someId").Divs[0].Text;
The above is working on WatiN 2.1, Win7, IE9.