Parsing HTML with XPath following Categories - c#

I have the following HTML structure, each tr tag is separated with each other, so when i tried to parse with XPATH, it is supposed to have 2 subitems for just one category, but with my code below it selects all 4 subitems into 1 category, so each category has 4 subitems instead of just 2.
<table class="available">
<tbody>
<tr>
<td class="catname" colspan="2">
<span>Category 1</span>
</td>
</tr>
<tr>
<td rowspan="2" colspan="1" class="itemdetail">
<div class="subname">
SubItem1-1
</div>
</td>
<td class="precioseleccion desgloseth">
<div class="preprice">
<strong class="price">39.99 €</strong>
</div>
</td>
</tr>
<tr>
<td rowspan="2" colspan="1" class="itemdetail">
<div class="subname">
SubItem1-2
</div>
</td>
<td class="precioseleccion desgloseth">
<div class="preprice">
<strong class="price">49.99 €</strong>
</div>
</td>
</tr>
<tr>
<td class="catname" colspan="2">
<span>Category 2</span>
</td>
</tr>
<tr>
<td rowspan="2" colspan="1" class="itemdetail">
<div class="subname">
SubItem2-1
</div>
</td>
<td class="precioseleccion desgloseth">
<div class="preprice">
<strong class="price">59.99 €</strong>
</div>
</td>
</tr>
<tr>
<td rowspan="2" colspan="1" class="itemdetail">
<div class="subname">
SubItem2-2
</div>
</td>
<td class="precioseleccion desgloseth">
<div class="tooltip3">
<strong class="price">69.99 €</strong>
</div>
</td>
</tr>
</tbody>
</table>
var doc = new HtmlDocument(); // with HTML Agility pack
doc.LoadHtml(uricontent);
var rooms = doc.DocumentNode
.SelectNodes("//table[#class='available']//td[#class='catname']")
.Select(r => new
{
Type= r.InnerText.CleanInnerText(),
SubTypes= r.SelectNodes("../..//tr//td[#class='itemdetail']//div[#class='subname']")
.Select(s => new
{
SubType= s.InnerText.CleanInnerText(),
Price =
s.SelectSingleNode(".//parent::td/following-sibling::td[#class='allprice']//div[#class='preprice']//strong[#class='price']")
.InnerText.CleanInnerText()
}).ToArray()
}).ToArray();

If I understand your question correctly, to select all the Categories you want //tr[td[#class='catname']], and to select their sub-items you want following-sibling::tr/td[div[#class='subname']].

Related

Print html table in 2 columns per page with fixed header

i need to print some content in a way that it maximizes space and fills the whole page. For splitting content in 2 columns i do this on server side. I have a list and divide it in half. While this worked when i did it directly on the content coming from database, it does not work after i grouped countent by classes in order to not repeat information. Hence why everything looks uneven on the html page.
AbastecimentosColuna1 = referenciasList.Take(referenciasList.Count() / 2).ToList();
AbastecimentosColuna2 = referenciasList.Skip(referenciasList.Count() / 2).ToList();
In other words, how can keep content adjusted to whole page? Or is there any other way to split content in 2 columns without splitting the array from server side?
content
<div class="row">
<div class="col-6 table-responsive">
<table class="table table-sm table-bordered border-dark text-center">
<thead>
<tr>
<th>Referência</th>
<th>Qtd. Abastecimento</th>
<th>Peças Por Caixa</th>
<th>Nº Caixas</th>
<th>Localização - Etiqueta FIFO</th>
</tr>
</thead>
<tbody>
#foreach (var item in Model.AbastecimentosColuna1)
{
<tr>
<td>
#Html.DisplayFor(modelItem => item.Referencia)
</td>
<td>
#Html.DisplayFor(modelItem => item.QtdAbastecimento)
</td>
<td>
#Html.DisplayFor(modelItem => item.QtdPecasPorCaixa)
</td>
<td>
#Html.DisplayFor(modelItem => item.QtdCaixas)
</td>
<td>
#foreach (var localizacao in item.Localizacoes)
{
<div class="row py-2">
<div class="col-6">
#localizacao.Localizacao
</div>
<div class="col-6">
#foreach (var etiqueta in localizacao.Etiquetas)
{
#etiqueta.Etiqueta
<br />
}
</div>
</div>
}
</td>
</tr>
}
</tbody>
</table>
</div>
<div class="col-6 table-responsive">
<table class="table table-sm table-bordered border-dark text-center">
<thead>
<tr>
<th>Referência</th>
<th>Qtd. Abastecimento</th>
<th>Peças Por Caixa</th>
<th>Nº Caixas</th>
<th>Localização - Etiqueta FIFO</th>
</tr>
</thead>
<tbody>
#foreach (var item in Model.AbastecimentosColuna2)
{
<tr>
<td>
#Html.DisplayFor(modelItem => item.Referencia)
</td>
<td>
#Html.DisplayFor(modelItem => item.QtdAbastecimento)
</td>
<td>
#Html.DisplayFor(modelItem => item.QtdPecasPorCaixa)
</td>
<td>
#Html.DisplayFor(modelItem => item.QtdCaixas)
</td>
<td>
#foreach (var localizacao in item.Localizacoes)
{
<div class="row py-2">
<div class="col-6">
#localizacao.Localizacao
</div>
<div class="col-6">
#foreach (var etiqueta in localizacao.Etiquetas)
{
#etiqueta.Etiqueta
<br />
}
</div>
</div>
}
</td>
</tr>
}
</tbody>
</table>
</div>
</div>

C# Get all the id of the html tag and set inner text for <td></td> tag

I have string html, I want to get all id name of tag in string html.
get string html in file text:
<tr>
<td class="X8">
</td>
<td colspan="6" class="X9"></td>
<td colspan="4" class="X12" id="closedate">
</td>
<td colspan="6" class="X9"></td>
<td colspan="4" class="X12" id="startdate">
</td>
<td class="X8">
</td>
<td class="X8" colspan="3">
</td>
<td class="X8">
</td>
<td colspan="9" class="X9"></td>
<td colspan="6" class="X15" id="totalpayment"></td>
<td class="X8">
</td>
<td class="X8">
</td>
</tr>
<tr>
<td class="X17">
</td>
<td class="X17" colspan="8">
</td>
<td class="X17" colspan="33">
</td>
<td class="X17">
</td>
</tr>
<tr>
<td class="X17">
</td>
<td class="X17" colspan="8">
<td class="X17" colspan="16">
</td>
<td class="X17">
</td>
<td colspan="9" class="X20"></td>
<td colspan="6" class="X23" id="approvaldate"></td>
<td class="X17">
</td>
<td class="X17">
</td>
</tr>
expected results:
closedate, startdate,totalpayment, approvaldate.
Then I want to set inner text for id name tag
(Ex:<td colspan="6" class="X23" id="approvaldate">2018/07/18</td>)
Using c#.Help me, please. Thanks a lot.
What I am understood from your question is you need the id of all in string simple Example Created for you
<form id="form1" runat="server">
<input id="Name" type="text" name="Full Name" runat="server" />
<input id="Email" type="text" name="Email Address" runat="server" />
<input id="Phone" type="text" name="Phone Number" runat="server" />
</form>
foreach (var control in Page.Form.Controls)
{
if (control is HtmlInputControl)
{
var htmlInputControl = control as HtmlInputControl;
string controlName = htmlInputControl.Name;
string controlId = htmlInputControl.ID;
}
}
Another Approach:-
HtmlElement table = testWebBrowser.Document.GetElementById("TableID");
if (table != null)
{
foreach (HtmlElement row in table.GetElementsByTagName("TR"))
{
// ...
}
}

Convert a HTML Table with rowspans to DataTable C#

I need to convert a Html Table to DataTable in C#. I used HtmlAgilityPack but it does not convert it well because of rowspans.
The code I am currently using is:
private static DataTable convertHtmlTableToDataTable()
{
WebClient webClient = new WebClient();
string urlContent = webClient.DownloadString("http://example.com");
string tableCode = getTableCode(urlContent);
string htmlCode = tableCode.Replace(" ", " ");
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(htmlCode);
var headers = doc.DocumentNode.SelectNodes("//tr/th");
DataTable table = new DataTable();
foreach (HtmlNode header in headers)
{
table.Columns.Add(header.InnerText);
}
foreach (var row in doc.DocumentNode.SelectNodes("//tr[td]"))
{
table.Rows.Add(row.SelectNodes("td").Select(td => td.InnerText).ToArray());
}
return table;
}
And this is a part of Html Table:
<table class="tabel" cellspacing="0" border="0">
<caption style="font-family:Verdana; font-size:20px;">SEMGRP</caption>
<tr>
<th class="celula" >Ora</th>
<th class="latime_celula celula">Luni</th>
<th class="latime_celula celula">Marti</th>
<th class="latime_celula celula">Miercuri</th>
<th class="latime_celula celula">Joi</th>
<th class="latime_celula celula">Vineri</th>
</tr>
<tr>
<td class="celula" nowrap="nowrap">8-9</td>
<td class="celula" rowspan="2">
<table border="0" align="center">
<tr>
<td nowrap="nowrap" align="center">
Curs
<br />
<a class="link_celula" href="afis_n0.php?id_tip=287&tip=p">Prof</a>
<br />
<a class="link_celula" href="afis_n0.php?id_tip=9&tip=s">Sala</a>
<br />
</td>
</tr>
</table>
</td>
<td class="celula" rowspan="2">
<table border="0" align="center">
<tr>
<td nowrap="nowrap" align="center">
Curs
<br />
<a class="link_celula" href="afis_n0.php?id_tip=287&tip=p">Prof</a>
<br />
<a class="link_celula" href="afis_n0.php?id_tip=12&tip=s">Sala</a>
<br />
</td>
</tr>
</table>
</td>
<td class="celula"> </td>
<td class="celula"> </td>
<td class="celula" rowspan="2">
<table border="0" align="center">
<tr>
<td nowrap="nowrap" align="center">
Curs
<br />
<a class="link_celula" href="afis_n0.php?id_tip=293&tip=p">Prof</a>
<br />
<a class="link_celula" href="afis_n0.php?id_tip=9&tip=s">Sala</a>
<br />
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td class="celula" nowrap="nowrap">9-10</td>
<td class="celula"> </td>
<td class="celula"> </td>
</tr>
<tr>
<td class="celula" nowrap="nowrap">10-11</td>
<td class="celula" rowspan="2">
<table border="0" align="center">
<tr>
<td nowrap="nowrap" align="center"> Curs
<br /><a class="link_celula" href="afis_n0.php?id_tip=303&tip=p">Prof</a>
<br /><a class="link_celula" href="afis_n0.php?id_tip=9&tip=s">Sala</a>
<br />
</td>
</tr>
</table>
</td>
<td class="celula" rowspan="2">
<table border="0" align="center">
<tr>
<td nowrap="nowrap" align="center"> Curs
<br />
<a class="link_celula" href="afis_n0.php?id_tip=331&tip=p">Prof</a>
<br />
<a class="link_celula" href="afis_n0.php?id_tip=14&tip=s">Sala</a>
<br />
</td>
</tr>
</table>
</td>
<td class="celula" rowspan="2">
<table border="0" align="center">
<tr>
<td nowrap="nowrap" align="center"> Curs
<br /><a class="link_celula" href="afis_n0.php?id_tip=330&tip=p">Prof</a>
<br /><a class="link_celula" href="afis_n0.php?id_tip=9&tip=s">Sala</a>
<br />
</td>
</tr>
</table>
</td>
<td class="celula"> </td>
<td class="celula" rowspan="2">
<table border="0" align="center">
<tr>
<td nowrap="nowrap" align="center"> Curs
<br />
<a class="link_celula" href="afis_n0.php?id_tip=293&tip=p">Prof</a>
<br />
<a class="link_celula" href="afis_n0.php?id_tip=10&tip=s">Sala</a> <br />
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td class="celula" nowrap="nowrap">11-12</td>
<td class="celula"> </td>
</tr>
<tr>
I tried some solutions but I did not find any good...
Thanks for any help in advance.

Failed to load pdf file generated by itextsharp

I am facing an error in generating a proper PDF Document. the code i have, can generate a pdf document, it can download the document, but the issue is i cannot view the view the document. This is the Page I am trying to export to pdf.
Here is my code so far:
ASPX:
<asp:Button ID="btnDownload" CssClass="btn" runat="server" Text="Download Invoice" OnClick="btnDownload_Click" />
<asp:Panel ID="pnl" runat="server">
<div id="page-wrap">
<textarea id="header" style="height: 30px">PAYMENT DETAILS</textarea>
<div id="identity">
<textarea style="background-color: #F7F7F7;" readonly="readonly" id="address">My Name
My Street Address
Phone: 111-111-111</textarea>
<div id="logo">
<div id="logoctr">
</div>
<div id="logohelp">
<input id="imageloc" readonly="readonly" type="text" size="50" value="" /><br />
(max width: 540px, max height: 100px)
</div>
<img id="image" src="images/logo.png" alt="logo" />
</div>
</div>
<div style="clear: both"></div>
<div id="customer">
<textarea id="tbCustomer" readonly="readonly" runat="server"></textarea>
<table id="meta">
<tr>
<td class="meta-head">Payment ID</td>
<td>
<textarea readonly="readonly" runat="server" id="tbPID"></textarea></td>
</tr>
<tr>
<td class="meta-head">Date</td>
<td>
<textarea id="date" readonly="readonly" runat="server"></textarea></td>
</tr>
<tr>
<td class="meta-head">Amount Due</td>
<td>
<div class="due">
$
<asp:Label ID="lblTotal" runat="server" Text="Total"></asp:Label>
</div>
</td>
</tr>
</table>
</div>
<table id="items">
<tr>
<th>Property Title</th>
<th>Description</th>
<th>Status</th>
<th>Invoiced By</th>
<th>Total Payment</th>
</tr>
<tr class="item-row">
<td class="item-name">
<div class="delete-wpr">
<textarea readonly="readonly" id="tbTitle" runat="server"></textarea>
</div>
</td>
<td class="description">
<div contenteditable="true" id="tbDetail" class="blank" runat="server">
</div>
</td>
<td>
<textarea id="tbStatus" runat="server" readonly="readonly">PAID</textarea></td>
<td>
<textarea class="qty" readonly="readonly" id="tbInvoicedBy" runat="server"></textarea></td>
<td><span class="price">$
<asp:Label ID="tbTotal1" runat="server" Text="Total"></asp:Label></span></td>
</tr>
<tr>
<td colspan="2" class="blank"></td>
<td colspan="2" class="total-line">Vaccant</td>
<td class="total-value">
<div id="subtotal">$<asp:Label ID="lblVaccant" runat="server" Text=""></asp:Label></div>
</td>
</tr>
<tr>
<td colspan="2" class="blank"></td>
<td colspan="2" class="total-line">Maintainance</td>
<td class="total-value">
<div id="total">$<asp:Label ID="lblMaintainance" runat="server" Text=""></asp:Label></div>
</td>
</tr>
<tr>
<td colspan="2" class="blank"></td>
<td colspan="2" class="total-line">Property Insurance</td>
<td class="total-value">
<div id="Insurance">$<asp:Label ID="lblInsurance" runat="server" Text=""></asp:Label></div>
</td>
</tr>
<tr>
<td colspan="2" class="blank"></td>
<td colspan="2" class="total-line">Dewa Bill</td>
<td class="total-value">
<div id="dewa">$<asp:Label ID="lblDewa" runat="server" Text=""></asp:Label></div>
</td>
</tr>
<tr>
<td colspan="2" class="blank"></td>
<td colspan="2" class="total-line">Furnishing Cost</td>
<td class="total-value">
<div id="FurnishingCost">$<asp:Label ID="lblFurnishing" runat="server" Text=""></asp:Label></div>
</td>
</tr>
<tr>
<td colspan="2" class="blank"></td>
<td colspan="2" class="total-line">Cleaning Fees</td>
<td class="total-value">
<div id="CleaningFees">$<asp:Label ID="lblCleaning" runat="server" Text=""></asp:Label></div>
</td>
</tr>
<tr>
<td colspan="2" class="blank"></td>
<td colspan="2" class="total-line">House Keeping</td>
<td class="total-value">
<div id="HouseKeeping">$<asp:Label ID="lblHouseKeeping" runat="server" Text=""></asp:Label></div>
</td>
</tr>
<tr>
<td colspan="2" class="blank"></td>
<td colspan="2" class="total-line">Next Rent Due</td>
<td class="total-value">
<div id="paid">$<asp:Label ID="lblNextRent" runat="server" Text=""></asp:Label></div>
</td>
</tr>
<tr>
<td colspan="2" class="blank"></td>
<td colspan="2" class="total-line">Rental Comission</td>
<td class="total-value">
<div id="RentalComission">$<asp:Label ID="lblRentalComission" runat="server" Text=""></asp:Label></div>
</td>
</tr>
<tr>
<td colspan="2" class="blank"></td>
<td colspan="2" class="total-line">Credit Card Fees</td>
<td class="total-value">
<div id="CreditCardFees">$<asp:Label ID="lblCreditCardFees" runat="server" Text=""></asp:Label></div>
</td>
</tr>
<tr>
<td colspan="2" class="blank"></td>
<td colspan="2" class="total-line">Pest Control</td>
<td class="total-value">
<div id="PestControl">$<asp:Label ID="lblPestControl" runat="server" Text=""></asp:Label></div>
</td>
</tr>
<tr>
<td colspan="2" class="blank"></td>
<td colspan="2" class="total-line">Chillar Utilities</td>
<td class="total-value">
<div id="ChillarUtilities">$<asp:Label ID="lblChillarUtilities" runat="server" Text=""></asp:Label></div>
</td>
</tr>
<tr>
<td colspan="2" class="blank"></td>
<td colspan="2" class="total-line">Du, etisilate wifi</td>
<td class="total-value">
<div id="DuEtisilatewifi">$<asp:Label ID="lblDuEtisilateWifi" runat="server" Text=""></asp:Label></div>
</td>
</tr>
<tr>
<td colspan="2" class="blank"></td>
<td colspan="2" class="total-line balance">Total Payment</td>
<td class="total-value balance">
<div class="due">$<asp:Label ID="lblTotal2" runat="server" Text=""></asp:Label></div>
</td>
</tr>
</table>
<div id="terms">
<h5>Terms</h5>
<textarea readonly="readonly">These payment details are final and non negotiable.</textarea>
</div>
</div>
</asp:Panel>
ASPX.CS
public void ExportToPDF()
{
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=Panel.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
StringWriter sw = new StringWriter();
HtmlTextWriter hw = new HtmlTextWriter(sw);
pnl.RenderControl(hw);
StringReader sr = new StringReader(sw.ToString());
Document pdfDoc = new Document(PageSize.A4, 10f, 10f, 100f, 0f);
HTMLWorker htmlparser = new HTMLWorker(pdfDoc);
PdfWriter.GetInstance(pdfDoc, Response.OutputStream);
pdfDoc.Open();
htmlparser.Parse(sr);
pdfDoc.Close();
sw.Close();
htmlparser.Close();
Response.Write(pdfDoc);
Response.End();
}
also it says the HTMLWorker class is obsolete.

HTMLAgilityPack and Trouble Returning Full Table

I'm working with some html tables and trying to dig through them with htmlagilitypack. The source html is found here: https://www.ultimate-guitar.com/search.php?title=breaking+benjamin+polyamorous&type%5B1%5D=200&rating%5B0%5D=4&rating%5B1%5D=5
Sample table:
<table cellspacing="1" class="tresults">
<tbody>
<tr>
<th width="175">Artist :</th>
<th>Song :</th>
<th width="115">Rating :</th>
<th width="80">Type :</th>
</tr>
<tr>
<td>
<a href="/tabs/breaking_benjamin_tabs.htm" class="song search_art">
<b>Breaking</b> <b>Benjamin</b>
</a>
</td>
<td>
<a target="_blank" href="http://plus.ultimate-guitar.com/tp/?artist=Breaking+Benjamin&song=Polyamorous" class="song js-tp_link"><b>Polyamorous</b></a>
<a target="_blank" class="js-tp_link" href="http://plus.ultimate-guitar.com/tp/?artist=Breaking+Benjamin&song=Polyamorous"><b
class="play_tab_list"title="Playback"></b></a>
</td>
<td class="gray4"></td>
<td><strong>tab pro</strong>
</td>
</tr>
<tr class="stripe">
<td> </td>
<td>
<b>Polyamorous</b> (ver 2)
</td>
<td class="gray4"><span class="rating"><span class="r_4"></span></span> <span>[ <b class="ratdig">5</b> ]</span>
</td>
<td><strong>tab</strong>
</td>
</tr>
<tr>
<td> </td>
<td>
<b>Polyamorous</b> (ver 4)
</td>
<td class="gray4"><span class="rating"><span class="r_4"></span></span> <span>[ <b class="ratdig">30</b> ]</span>
</td>
<td><strong>tab</strong>
</td>
</tr>
<tr class="stripe">
<td> </td>
<td>
<b>Polyamorous</b> (ver 5)
</td>
<td class="gray4"><span class="rating"><span class="r_4"></span></span> <span>[ <b class="ratdig">12</b> ]</span>
</td>
<td><strong>tab</strong>
</td>
</tr>
<tr>
<td> </td>
<td>
<b>Polyamorous</b> (ver 6)
<span rel="#info_333408" class="tabinfo">info</span>
<div class="dn" id="info_333408">
<font style="font-family:trebuchet ms;font-size:12px;font-weight:bold;line-height:120%"><b><font color="#DDDDCC">+</font> Difficulty:</b> <font color="#DDDDCC">novice</font>
<br>
</font>
</div>
</td>
<td class="gray4"><span class="rating"><span class="r_4"></span></span> <span>[ <b class="ratdig">20</b> ]</span>
</td>
<td><strong>tab</strong>
</td>
</tr>
<tr class="stripe">
<td> </td>
<td>
<b>Polyamorous</b> (ver 7)
</td>
<td class="gray4"><span class="rating"><span class="r_4"></span></span> <span>[ <b class="ratdig">5</b> ]</span>
</td>
<td><strong>tab</strong>
</td>
</tr>
<tr>
<td> </td>
<td>
<b>Polyamorous</b> (ver 8)
<span rel="#info_952279" class="tabinfo">info</span>
<div class="dn" id="info_952279">
<font style="font-family:trebuchet ms;font-size:12px;font-weight:bold;line-height:120%"><b><font color="#DDDDCC">+</font> Difficulty:</b> <font color="#DDDDCC">novice</font>
<br>
</font>
<p style="margin-top:3px"><font style="font-family:trebuchet ms;font-size:12px;font-weight:bold;line-height:120%"><b><font color="#DDDDCC">+</font> Tuning:</b> <font color="#DDDDCC">Drop C#</font></font>
</p>
</div>
</td>
<td class="gray4"><span class="rating"><span class="r_5"></span></span> <span>[ <b class="ratdig">6</b> ]</span>
</td>
<td><strong>tab</strong>
</td>
</tr>
<tr class="stripe">
<td> </td>
<td>
<b>Polyamorous</b> Acoustic
<span rel="#info_258880" class="tabinfo">info</span>
<div class="dn" id="info_258880">
<font style="font-family:trebuchet ms;font-size:12px;font-weight:bold;line-height:120%"><b><font color="#DDDDCC">+</font> Difficulty:</b> <font color="#DDDDCC">novice</font>
<br>
</font>
</div>
</td>
<td class="gray4"><span class="rating"><span class="r_5"></span></span> <span>[ <b class="ratdig">9</b> ]</span>
</td>
<td><strong>tab</strong>
</td>
</tr>
</tbody>
</table>
In order to grab this table from the full html doc, here is a snippet of my C# code:
string source_code = web.DownloadString("https://www.ultimate-guitar.com/search.php?title="+ songArtist + songTitle + "&type%5B1%5D=200&rating%5B0%5D=4&rating%5B1%5D=5");
doc.LoadHtml(source_code);
HtmlNodeCollection resultsTable = doc.DocumentNode.SelectSingleNode("//table[#class='tresults']");
foreach(var cell in resultsTable.Descendants())
{
Console.WriteLine(cell.InnerHtml);
}
I am expecting to have the full contents of the table returned, except it stops at the line: <b class="play_tab_list" title="Playback"></b>
My ultimate goal is to return all of the links in the table, but I cannot even get as far as to see the full table.
This code will print the url for all links on the table.
var doc = new HtmlDocument();
var web = new WebClient();
string source_code = web.DownloadString("https://www.ultimate-guitar.com/search.php?title=breaking+benjamin+polyamorous&type[1]=200&rating[0]=4&rating[1]=5");
doc.LoadHtml(source_code);
HtmlNodeCollection links = doc.DocumentNode.SelectNodes("//a[contains(#class,'link')]");
foreach (var link in links)
{
Console.WriteLine("{0} {1}", link.InnerText, link.Attributes["href"].Value);
}

Categories