I am working to get some information from a html table which has many rows like this. The given row is like one piece of info in a table cell. I need to get link, artist name, artist type from this table.
<a href="http://somesite/music/view_album.php?albumid=6468" style="color:#000;" sl-processed="1">
<table width="100%" border="0" bgcolor="#FFFFFF">
<tbody><tr>
<td colspan="2" align="left" valign="top" style="color:#900;">album title</td>
</tr>
<tr> <td width="31%" align="left" valign="top"> <img src="./albums_files/No_cover.png" width="90" height="80" border="0">
</td>
<td width="69%" align="left" valign="top">
<a class="leftcat" href="http://somelink/toartiset" sl-processed="1"> <strong>Rizwan-Muazzam</strong>
</a>
<br>
(<a class="leftcat" href="http://linktoartisttype/" sl-processed="1">
Some Artist Type </a>) <br>
<span class="leftcat">
Rated +: 0<br>
Rated -: 0 </span>
</td>
</tr>
<tr> <td valign="top" align="center" colspan="2">
</td> </tr>
</tbody></table>
</a>
I have done this
HtmlDocument doc = new HtmlDocument();
doc = new HtmlWeb().Load(albumUrl);
var nodes = doc.DocumentNode.SelectNodes("//a[#href]");
this gives me all the links which I need, now I want to get all the child information under the hyperlink.
Help will be appreciated.
Regards
Parminder
I would suggest using a loop to go through all the rows and then select the links and extract the info from them:
var rows = doc.DocumentNode.SelectNodes("//tr");
foreach (var row in rows)
{
var links = row.SelectNodes(".//a");
var artistLink = links[0].Attributes["href"];
var artistName = links[0].SelectSingleNode(".//strong/text()").InnerText;
var artistTypeLink = links[1].Attributes["href"];
var artistTypeName = links[1].SelectSingleNode(".//text()").InnerText;
// Store the results...
}
Related
Want to add <tbody> element in <table> elements if missing on Xdcoument.
<table class="newtable" id="item_559_Table1" cellpadding="0" cellspacing="0" data-its-style="width:11.4624em; border-spacing:0;">
<colgroup data-its-style="width:11.4624em; " />
<tr>
<td data-its-style="padding:0.2292em; vertical-align:top; ">
<p data-its-style="">My dad cooks up a pot of chicken soup, and</p>
</td>
</tr>
<tr>
<td data-its-style="padding:0.2292em; vertical-align:top; ">
<p data-its-style="font-weight:normal; ">This cold means I can’t taste a thing today!</p>
</td>
</tr>
</table>
Output should look like
<table class="newtable" id="item_559_Table1" cellpadding="0" cellspacing="0" data-its-style="width:11.4624em; border-spacing:0;">
<colgroup data-its-style="width:11.4624em; " />
<tbody>
<tr>
<td data-its-style="padding:0.2292em; vertical-align:top; ">
<p data-its-style="">My dad cooks up a pot of chicken soup, and</p>
</td>
</tr>
<tr>
<td data-its-style="padding:0.2292em; vertical-align:top; ">
<p data-its-style="font-weight:normal; ">This cold means I can’t taste a thing today!</p>
</td>
</tr>
</tbody>
</table>
**Not looking for XSLT solution.
One way to do it would be to grab the children of <table>, then add them back they way you want them.
var doc = XDocument.Load("file.xml");
var colgroup = doc.Root.Elements("colgroup");
var tr = doc.Root.Elements("tr");
// Add tr to tbody
var tbody = new XElement("tbody", tr);
// Replace the children of table with colgroup and tbody
doc.Root.ReplaceNodes(colgroup, tbody);
I am trying to get the text "Very Good Country views" and "Good" using HTMLAgilityPack.
<div class="property-details-section">
<h5><span id="content_lblFurtherDetails">Further Details</span></h5>
<ul id="features">
<li style="display:block;">
<table border="0" cellpadding="0" cellspacing="0" width="500">
<tr>
<td style="width: 15px;">
<img src="../images/bullet.png" alt="bullet" />
</td>
<td style="width: 185px;">Views</td>
<td style="width: 300px;">Very Good Country views</td>
</tr>
</table>
</li>
</ul>
<li style="display:block;">
<table border="0" cellpadding="0" cellspacing="0" width="500">
<tr>
<td style="width: 15px;">
<img src="../images/bullet.png" alt="bullet" />
</td>
<td style="width: 185px;">Finish</td>
<td style="width: 300px;">Good</td>
<tr>
</table>
</li>
</div>
I have tried the following for "Very Good Country views" with no success:
HtmlNode text =
doc.DocumentNode.SelectSingleNode("//ul[#id='features']/li/table/tr/td[3]");
I am trying to get the text "Very Good Country views" and "Good"
You have to select 2 elements, so you should use SelectNodes instead of SelectSingleNode, if you want get the result at once.
var result = doc.DocumentNode.SelectNodes("//ul[#id='features']/li/*//td[last()]")
.Select(td => td.InnerText)
.ToList();
I think the problem about your XPath is that you should add brackets around the expression:
var text = doc.DocumentNode
.SelectSingleNode("(//ul[#id='features']/li/table/tr/td)[3]");
You can also try using LINQ:
var td = doc.Descendants("ul")
.First(x => x.GetAttributeValue("id","") == "features")
.Descendants("td")
.Skip(2)
.First();
var text = td.InnerText;
How to read <table> into onmouseover event with C# and HTMLAgilityPack?
markup code :
<a href="#" class="chan_live_not_free" onclick="return false;" onmouseover="return overlib('
<table>
<tr class=fieldRow>
<td class=posH_col width=40>
<strong>pos</strong>
</td>
<td class=rest_col width=90>
<strong>satellite</strong>
</td>
<td class=freqH_col width=50>
<strong>freq</strong>
</td>
<td class=rest_col width=90>
<strong>symbol</strong>
</td>
<td class=rest_col width=90>
<strong>encryption</strong>
</td>
</tr>
<tr>
<td class="pos_col">39.0°e</td>
<td class=rest_col>Hellas Sat 2</td>
<td class="freq_col">12.606 H</td>
<td class=rest_col>30000 - 2/3</td>
<td class=enc_not_live>MPEG-4 BulCrypt</td>
</tr>
</table>',CAPTION, 'Arena Sport 4 (serbia) – 19/10/14 - 11:30');" onmouseout="return nd();">
Arena Sport 4 (serbia)
</a>
I need to read the table into onmouseover event. How does it read?
You could get the element attribute of the <a> tag with HTML Agility Pack and then using regular expressions get the <table> inside the string, something like the following code :
var html = #"<a href='#' class='chan_live_not_free' onclick='return false;' onmouseover='return overlib(
<table>
<tr class=fieldRow>
<td class=posH_col width=40>
<strong>pos</strong>
</td>
<td class=rest_col width=90>
<strong>satellite</strong>
.
.
.
<tr>
<td class="pos_col">39.0°e</td>
<td class=rest_col>Hellas Sat 2</td>
<td class="freq_col">12.606 H</td>
<td class=rest_col>30000 - 2/3</td>
<td class=enc_not_live>MPEG-4 BulCrypt</td>
</tr>
</table>,CAPTION, 'Arena Sport 4 (serbia) – 19/10/14 - 11:30');' onmouseout='return nd();'>
Arena Sport 4 (serbia)
</a>";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var value = doc.DocumentNode.SelectSingleNode("//a[#class='chan_live_not_free']").Attributes["onmouseover"].Value;
var text = Regex.Matches(value, #"<table>([^)]*)</table>")[0].Value;
I have an HTML table that I request from a webserver in C#. I am then displaying the page in my aspx webform. How can I add a prerequisite based on the course ID to the last column in the table without hard-coding the prerequisite? Example of the table design is below.
<tr bgcolor="#E1E1CC">
<td width="7%">003597</td>
<td width="5%">01</td>
<td width="1%">OPT</td>
<td width="8%">MT H </td>
<td width="16%">2:00 pm - 2:50 pm </td>
<td width="17%">08/26/13 - 12/12/13</td>
<td width="8%">
<a href="http://www.mnsu.edu/registrar/building.html"target = _blank>
<b>TR C124</b>
</a>
</td>
<td width="19%">Staff</td>
<td width="4%">22</td>
<td width="4%">6</td>
<td width="4%"><font color="#000000">Open</font></td>
<td width="7%">
<a href=Notes.asp?SpclNote=20143+003597+IT+100 target = _blank>
<b>Notes</b>
</a>
</td>
</tr>
<tr bgcolor="#E1E1CC">
<td colspan="3"> </td>
<td width="8%">M </td>
<td width="16%">10:00 am - 11:50 am</td>
<td width="17%">08/26/13 - 12/09/13</td>
<td width="8%">
<a href="http://www.mnsu.edu/registrar/building.html"target = _blank>
<b>WH 0119 </b>
</a>
</td>
<td width="19%">Staff</td>
<td colspan="4"> </td>
</tr>
If you are getting the page html as a string you could just insert your html into it. Something like:
private void SetValue(string PageHtml, string ID, string TextToInsert)
{
string html = PageHtml;
string sMyHtmlToInsert = TextToInsert;
int iSplitIndex = html.IndexOf(ID);
iSplitIndex = html.IndexOf("{tag}",iSplitIndex);
string sHtml1 = html.SubString(0, iSplitIndex);
string sHtml2 = html.SubString(iSplitIndex);
string sFinalHtml = sHtml1 + sMyHtmlToInsert + sHtml2;
}
here's my sample HTML...
<html>
<table class="test" border="0" >
<tr bgColor="#e8f4ff">
<td width="50%" align="right">
<b>Invoice ID:</b>
</td>
<td width="50%">
<b>
1622579
</b>
</td>
</tr>
<tr bgColor="#e8f4ff">
<td align="right">
<b>Code:</b>
</td>
<td>
<b>
20475
</b>
</td>
</tr>
</html>
there's no ID so ican't use SelectNodes()
How can i get the Code: 20475 using HTMLAgilitypack or regex?
Using latest HtmlAgilityPack, just using the document structure - this will not be very resilient to changes in the HTML - you should strongly consider adding appropriate ids (if this is your html anyway):
HtmlDocument doc = new HtmlDocument();
doc.Load(#"test.html");
var tds = doc.DocumentNode.Descendants("td").ToArray();
string codeValue = "";
for (int i = 1; i < tds.Length; i++)
{
if (tds[i - 1].Element("b").InnerText == "Code:")
codeValue = tds[i].Element("b").InnerText;
}