Parsing a HTML structure - c#

Below is my html structure (table):
<table>
<tr><td>A</td></tr>
</table>
<table>
<tr><td>B</td></tr>
</table>
<table>
<table>
<table>
<table>
<tbody>
<tr class="A">
<td>
ABC
</td>
<td>
Link
</td>
</tr>
<tr class="B">
<td>
DEF
</td>
<td>
Link2
</td>
</tr>
</tbody>
</table>
</table>
</table>
</table>
I tried to get data as below:
HtmlNode thediv = doc.DocumentNode.SelectSingleNode("//table[3]//table[1]");
⇒ It works well.
But, I tried with code as below to get data ABC/DEF in table 3.
HtmlNode thediv = doc.DocumentNode.SelectSingleNode(
"//table[3]//table[1]//table[2]//table[3]");
⇒ Not OK.

I think what you actually want is
var bothNodes = doc.DocumentNode.SelectNodes("//table[3]//table[1]//tr/td[1]/text()");
That will give you both nodes ABC and DEF of the third table
You can try it here: XPathFiddle
Your code doesn't work because there is no node that fits the second query.
Step by Step:
This is your original html:
<table>
<tr><td>A</td></tr>
</table>
<table>
<tr><td>B</td></tr>
</table>
<table>
<table>
<table>
<table>
<tbody>
<tr class="A">
<td>
ABC
</td>
<td>
Link
</td>
</tr>
<tr class="B">
<td>
DEF
</td>
<td>
Link2
</td>
</tr>
</tbody>
</table>
</table>
</table>
</table>
//table[3] gives you the third table
<table>
<table>
<table>
<table>
<tbody>
<tr class="A">
<td>
ABC
</td>
<td>
Link
</td>
</tr>
<tr class="B">
<td>
DEF
</td>
<td>
Link2
</td>
</tr>
</tbody>
</table>
</table>
</table>
</table>
//table[3]//table[1] gives you the first table that's a descendant of the third table.
<table>
<table>
<table>
<tbody>
<tr class="A">
<td>
ABC
</td>
<td>
Link
</td>
</tr>
<tr class="B">
<td>
DEF
</td>
<td>
Link2
</td>
</tr>
</tbody>
</table>
</table>
</table>
//table[3]//table[1]//table[2] would give you the second table that's a descendant the first table that's a descendant of the third table. And there is only one --> doesn't work.

Hi Manfred Radlwimmer,
Thank for your answer. I did it :).
The code is below:
if (doc.DocumentNode.SelectNodes("//table") != null)
{
HtmlNode thediv = doc.DocumentNode.SelectSingleNode("//table[3]//table[1]//tr/td[1]//tr[3]//table//tr/td[2]//table");
HtmlNodeCollection cells = thediv.SelectNodes("tr");
for (var j = 1; j < cells.Count; ++j)
{
var data= cells[j].InnerText;
}
}

Related

How to show tr with number td in view mvc?

I have data as list, I want to break each row with 2 columns. I tried so hard but it is not working. This is sample html. I have 4 data rows, it should show two rows, each row with two columns.
<table>
<tr>
<td>
<table>
<tr>
<td>
<table>
<tr>
<td>TYPE</td>
<td>Blue</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table>
<tr class='bg-red'>
<td>13:45</td>
<td>2017ABC-0001</td>
</tr>
<tr>
<td>12:45</td>
<td>2017WEX-0002</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
<td>
<table>
<tr>
<td>
<table>
<tr>
<td>TYPE</td>
<td>Red</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table>
<tr class='bg-red'>
<td>13:45</td>
<td>2017ABC-0001</td>
</tr>
<tr>
<td>12:45</td>
<td>2017WEX-0002</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table>
<tr>
<td>
<table>
<tr>
<td>TYPE</td>
<td>Green</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table>
<tr class='bg-red'>
<td>13:45</td>
<td>2017ABC-0001</td>
</tr>
<tr>
<td>12:45</td>
<td>2017WEX-0002</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
<td>
<table>
<tr>
<td>
<table>
<tr>
<td>TYPE</td>
<td>Yellow</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table>
<tr class='bg-red'>
<td>13:45</td>
<td>2017ABC-0001</td>
</tr>
<tr>
<td>12:45</td>
<td>2017WEX-0002</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
And this is sample code
string result = "<table>";
List<string> listTYPE = GetTYPE();
foreach (var item in listTYPE)
{
result += "<tr><td><table> <tr><td><table><tr><td>TYPE</td><td>"
+ item + "</td></tr></table></td></tr><tr><td><table>";
List<Detail_Product> listDetail = GetDetail(item);
foreach (var detail in listDetail)
{
if (detail.PO_NO.Contains("ABC"))
{
result += "<tr class='class='bg-red''><td>" + detail.CREATE_DATE + "</td><td>" + detail.PO_NO + "</td>";
}
result += "<tr><td>" + detail.CREATE_DATE + " </td><td>" + detail.PO_NO + "</td>";
}
result += "</tr></table></td></tr> </table></td></tr>";
}
result += "</table>";
ViewBag.result = result;
What should I do?, I thought to check if (listLINE.IndexOf(line) > 2) I will break new <tr> but I don't know how to do it.

Use htmlagilitypack to get data from nested table without ID

Can someone please provide me some sample code to get data from a nested table below. I want data from Row 1 to row 2 all the columns. If there is an ID in the table I am able to grab data, but there is no ID. I pretty much search all over the internet and still could not find an answer. Please help
<div id="Div-content_ID">
<table><tr><td>
<table>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
</table>
</td>
</tr>
<tr><td></td></tr>
<tr>
<td >
<table >
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table width="100%" cellpadding="3" cellspacing="1" border="0" bgcolor="#d3d3d3">
<tr align="center" valign="middle">
<td>row1 Col 1</td>
<td >row1 Col 2 </td>
<td >row1 Col 3 </td>
<td >row1 Col 4 </td>
<td >row1 Col 5 </td>
<td >row1 Col 6 </td>
<td >row1 Col 7 </td>
<td >row1 Col 8 </td>
<td >row1 Col 9 </td>
<td >row1 Col 10 </td>
<td >row1 Col 11 </td>
<tr>
<tr>
<td>row2 Col 1</td>
<td >row2 Col 2 </td>
<td >row2 Col 3 </td>
<td >row2 Col 4 </td>
<td >row2 Col 5 </td>
<td >row2 Col 6 </td>
<td >row2 Col 7 </td>
<td >row2 Col 8 </td>
<td >row2 Col 9 </td>
<td >row2 Col 10 </td>
<td >row2 Col 11 </td>
</tr>
<tr>
<td>
<table>
<tr>
<td></td><td></td></tr>
</table>
</td>
</tr>
<tr><td></td></tr>
<tr>
<td</td>
</tr>
<tr><td></td></tr>
</table>
</div>
1) Your HTML is poorly formed:
The 1st table never gets closed off properly. There is missing </td> </tr> </table>
There's a <td></td> pair near the end that has a missing '>'
2) With HTML Agility Pack you can select on anything, not just id or classes. So, as long as your HTML structure remains the same, you could select the 1st div, then from its children: the 1st table, then from its children: the 4th row, then from its children: the 1st table. etc etc
See here for an example for selecting by table: HTML Agility pack - parsing tables

Can I sent values from collection in model through submit?

I have this table
<table id="Products" class="Products">
<tr>
<th>ProductId</th>
<th>Productname</th>
<th>Quantity</th>
<th>UnitPrice</th>
</tr>
<% for(int i=0; i < Model.NorthOrderDetails.Count; i++)
{
%>
<tr>
<td><%: Html.Label(Model.NorthOrderDetails[i].ProductID.ToString()) %></td>
<td><%: Html.Label(Model.NorthOrderDetails[i].ProductName) %> </td>
<td><%: Html.TextBoxFor(m => m.NorthOrderDetails[i].Quantity) %></td>
<td><%: Html.TextBoxFor(m => m.NorthOrderDetails[i].UnitPrice) %></td>
<td> <input type="submit" value="?" Name="qodel" /> </td>
</tr>
<% } %>
</table>
<% } %>
And I want to sent to controller the value of ProductID in line which I press button. How I can do it?
So without Javascript you are limiting yourself slightly here, I think the code below should work but I'm sure someone can poke a hole in it!
<table id="Products" class="Products">
<thead>
<tr>
<th>ProductId</th>
<th>Productname</th>
<th>Quantity</th>
<th>UnitPrice</th>
</tr>
</thead>
<tbody>
#{
for(int i=0; i < Model.NorthOrderDetails.Count; i++)
{
<tr>
<td>
#Html.Label(Model.NorthOrderDetails[i].ProductID.ToString())
</td>
<td>
#Html.Label(Model.NorthOrderDetails[i].ProductName)
</td>
<td>
#Html.TextBoxFor(m => m.NorthOrderDetails[i].Quantity)
</td>
<td>
#Html.TextBoxFor(m => m.NorthOrderDetails[i].UnitPrice)
</td>
<td>
#Html.ActionLink("Edit", "Vi")
</td>
<td>
<input type="submit" name="submitValue"
value="#Model.NorthOrderDetails[i].SomeValue"/>
</td>
</tr>
}
}
</tbody>
</table>
Then make sure that your controller action accepts the name attribute of the input as a parameter:
public ActionResult YourControllerMethod(string submitValue) {
....
}
The code would look something like that in MVC3+, <% is from MVC2.

Xpath select all tr without table with id=x

Hello i need to select all tr,but in some tr i have a table with id=WHITE_BANKTABLE.
I need to select only Tr that dont't have this table with id.
My html
<table id=mytable_body>
<TR id=TR_ROW_BANKTABLE class=TR_ROW_BANKTABLE style="BACKGROUND-COLOR: #f6f8fa" align=right bgColor=#f6f8fa>
<TD noWrap align=right w_idth="190"> </TD>
<TD align=right>010073/15922</TD>
</TR>
> **//This Tr with TABLE id=WHITE_BANKTABLE i don't need**
<TR>
<TD colSpan=8 align=center>
<TABLE id=WHITE_BANKTABLE cellSpacing=0 borderColorDark=#edf0f5 cellPadding=3 width="100%" bgColor=white borderColorLight=#edf0f5 border=1 isWhiteTable="Y">
<TBODY>
<TR class=TR_BANKTABLE align=right vAlign=top>
<TD> sdfsd </TD>
<TD>sdfs</TD>
</TR>
</TBODY>
</TABLE>
</TD>
</TR>
<TR id=TR_ROW_BANKTABLE class=TR_ROW_BANKTABLE style="BACKGROUND-COLOR: #f6f8fa" align=right bgColor=#f6f8fa>
<TD noWrap align=right w_idth="190"> </TD>
<TD align=right>010073/15922</TD>
</TR>
</table>
Thanx.
Assuming the above is correctly formatted as XML (insert missing double quotes):
var q =
xml.XPathSelectElements(#"/tr[not(descendant::table[#id = 'WHITE_BANKTABLE'])]");

Parsing HTML with XPath following Categories

I have the following HTML structure, each tr tag is separated with each other, so when i tried to parse with XPATH, it is supposed to have 2 subitems for just one category, but with my code below it selects all 4 subitems into 1 category, so each category has 4 subitems instead of just 2.
<table class="available">
<tbody>
<tr>
<td class="catname" colspan="2">
<span>Category 1</span>
</td>
</tr>
<tr>
<td rowspan="2" colspan="1" class="itemdetail">
<div class="subname">
SubItem1-1
</div>
</td>
<td class="precioseleccion desgloseth">
<div class="preprice">
<strong class="price">39.99 €</strong>
</div>
</td>
</tr>
<tr>
<td rowspan="2" colspan="1" class="itemdetail">
<div class="subname">
SubItem1-2
</div>
</td>
<td class="precioseleccion desgloseth">
<div class="preprice">
<strong class="price">49.99 €</strong>
</div>
</td>
</tr>
<tr>
<td class="catname" colspan="2">
<span>Category 2</span>
</td>
</tr>
<tr>
<td rowspan="2" colspan="1" class="itemdetail">
<div class="subname">
SubItem2-1
</div>
</td>
<td class="precioseleccion desgloseth">
<div class="preprice">
<strong class="price">59.99 €</strong>
</div>
</td>
</tr>
<tr>
<td rowspan="2" colspan="1" class="itemdetail">
<div class="subname">
SubItem2-2
</div>
</td>
<td class="precioseleccion desgloseth">
<div class="tooltip3">
<strong class="price">69.99 €</strong>
</div>
</td>
</tr>
</tbody>
</table>
var doc = new HtmlDocument(); // with HTML Agility pack
doc.LoadHtml(uricontent);
var rooms = doc.DocumentNode
.SelectNodes("//table[#class='available']//td[#class='catname']")
.Select(r => new
{
Type= r.InnerText.CleanInnerText(),
SubTypes= r.SelectNodes("../..//tr//td[#class='itemdetail']//div[#class='subname']")
.Select(s => new
{
SubType= s.InnerText.CleanInnerText(),
Price =
s.SelectSingleNode(".//parent::td/following-sibling::td[#class='allprice']//div[#class='preprice']//strong[#class='price']")
.InnerText.CleanInnerText()
}).ToArray()
}).ToArray();
If I understand your question correctly, to select all the Categories you want //tr[td[#class='catname']], and to select their sub-items you want following-sibling::tr/td[div[#class='subname']].

Categories