HTMLAgilityPack Skip HTML table caption - c#

I have an html table like below:
<table>
<caption>Table 2</caption>
<tr><td>hd1</td><td>hd2</td></tr>
<tr><td>val01</td><td>val02</td></tr>
<tr>
<td colspan="2">
<table>
<caption>Subtable 2</caption>
<tr><td>subval01</td><td>subval02</td></tr>
</table>
</td>
</tr>
</table>
EDIT
Here is my code:
foreach (HtmlNode rows in htmltable.SelectNodes("tr"))
{
DataRow dr = dt.NewRow();
int iRow = 0;
if (!rows.InnerHtml.Contains("<caption>"))
{
foreach (HtmlNode cell in rows.SelectNodes("td"))
{
iRow++;
dr[iRow] = cell.InnerText;
}
}
dt.Rows.Add(dr);
}
My code recognizing <caption> as row and selecting it as well.
I don't get how to skip caption while parsing. So I can parse ONLY the rows.Skip(1) method is not working for me.

If I understand this correctly, you want to skip <tr> having descendant node <caption> (the last <tr> within outer <table> tag). In this case we can use XPath to select only <tr> that doesn't have <caption> like so :
foreach (HtmlNode rows in htmltable.SelectNodes("tr[not(.//caption)]"))
{
DataRow dr = dt.NewRow();
.....
.....
dt.Rows.Add(dr);
}

Related

Get specific table from html document with HtmlAgilityPack C#

I have html document with two tables. For example:
<html>
<body>
<p>This is where first table starts</p>
<table>
<tr>
<th>head</th>
<th>head1</th>
</tr>
<tr>
<td>data</td>
<td>data1</td>
</tr>
</table>
<p>This is where second table starts</p>
<table>
<tr>
<th>head</th>
<th>head1</th>
</tr>
<tr>
<td>data</td>
<td>data1</td>
</tr>
</table>
</body>
</html>
And i want to parse first and second but separatly
I will explain:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(#richTextBox1.Text);
if(comboBox_tables.Text.Equals("Table1"))
{
DataTable dt = new DataTable();
dt.Columns.Add("id", typeof(string));
dt.Columns.Add("inserted_at", typeof(string));
dt.Columns.Add("DisplayName", typeof(string));
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[1]");
foreach (var row in doc.DocumentNode.SelectNodes("//tr"))
{
var nodes = row.SelectNodes("td");
if (nodes != null)
{
var id = nodes[0].InnerText;
var inserted_at = nodes[1].InnerText;
var DisplayName = nodes[2].InnerText;
dt.Rows.Add(id, inserted_at, DisplayName);
}
dataGridView1.DataSource = dt;
I'm trying to select first table with //table[1]. But it's always takes both tables. How can i select the first table for if(table1) and the second for else if(table2)?
You are selecting the table[1], but not doing anything with the return value.
Use the table variable to select all tr nodes.
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[1]");
foreach (var row in table.SelectNodes("//tr"))
.. rest of the code

How to find last column of a table using Html Agility Pack

I have a table like this:
<table border="0" cellpadding="0" cellspacing="0" id="table2">
<tr>
<th>Name
</th>
<th>Age
</th>
</tr>
<tr>
<td>Mario
</td>
<th>Age: 78
</td>
</tr>
<tr>
<td>Jane
</td>
<td>Age: 67
</td>
</tr>
<tr>
<td>James
</td>
<th>Age: 92
</td>
</tr>
</table>
I want to get the last td from all rows using Html Agility Pack.
Here is my C# code so far:
await page.GoToAsync(NumOfSaleItems, new NavigationOptions
{
WaitUntil = new WaitUntilNavigation[] { WaitUntilNavigation.DOMContentLoaded }
});
var html4 = page.GetContentAsync().GetAwaiter().GetResult();
var htmlDoc4 = new HtmlDocument();
htmlDoc4.LoadHtml(html4);
var SelectTable = htmlDoc4.DocumentNode.SelectNodes("/html/body/div[2]/div/div/div/table[2]/tbody/tr/td[1]/div[3]/div[2]/div/table[2]/tbody/tr/td[4]");
if (SelectTable.Count == 0)
{
continue;
}
else
{
foreach (HtmlNode row in SelectTable)//
{
string value = row.InnerText;
value = value.ToString();
var firstSpaceIndex = value.IndexOf(" ");
var firstString = value.Substring(0, firstSpaceIndex);
LastSellingDates.Add(firstString);
}
}
How can I get only the last column of the table?
I think the XPath you want is: //table[#id='table2']//tr/td[last()].
//table[#id='table2'] finds the table by ID anywhere in the document. This is preferable to a long brittle path from the root, since a table ID is less likely to change than the rest of the HTML structure.
//tr gets the descendent rows in the table. I'm using two slashes in case there might be an intervening <tbody> element in the actual HTML.
/td[last()] gets the last <td> in each row.
From there you just need to select the InnerText of each <td>.
var tds = htmlDoc.DocumentNode.SelectNodes("//table[#id='table2']//tr/td[last()]");
var values = tds?.Select(td => td.InnerText).ToList() ?? new List<string>();
Working demo here: https://dotnetfiddle.net/7I8yk1

How can i add HTML input from C#, ASP.net

I have this HTML table in my aspx page which will be populated from datatable, and I need to put a checkbox at the start of each row.
I tried to use the code below in which I used StringBuilder and I also tried using TagBuilder but nothing works.
//Building an HTML string.
StringBuilder html = new StringBuilder();
//Building the Data rows.
foreach (DataRow row in dt.Rows)
{
html.Append("<tr>");
html.Append("<td>");
html.AppendFormat("< input type = 'checkbox' />"); //here it displays the tag as string
html.Append("</td>");
foreach (DataColumn column in dt.Columns)
{
html.Append("<td>");
html.Append(row[column.ColumnName]); //here it displays the data
html.Append("</td>");
}
html.Append("</tr>");
}
datatableBody.Controls.Add(new Literal
{
Text = html.ToString()
});
Here is the Table :
<table class="table dataTable my-0" id="dataTable" style="text-align:center">
<thead>
<tr>
<th><input type="checkbox" /></th>
<th>Matricule</th>
<th>Prenom</th>
<th>Date</th>
<th>Periode</th>
<th>Total</th>
</tr>
</thead>
<tbody id="datatableBody" runat="server">
</tbody>
</table>
Here are the results:
I also want to be able to access the value of the checkbox (checked or not).
I am not sure about that but I think you can use something like this :
html.Append("<input type=\"checkbox\" name=\"CheckSelect").Append(row["id"]).Append("\">");
Or If you don't mind to use Jquery, here is a solution for you :
https://forums.asp.net/t/2156602.aspx?Appending+the+Checkbox+to+the+HTML+Table+

How add a new row and columns with EPPlus C# and MVC?

I use EPPlus version 4.0.4 to create an Excel file within an ASP.net MVC5 website and I simply don't know how to add a new rows and columns after completing the very first row.
It's difficult to explain clearly so let's me show you my code. It's easy enough to understand.
var products = new DataTable("1km subscriptions");
// SO HERE THE FIRST COLUMS with ROWS INFORMATION
products.Columns.Add("", typeof(int));
products.Columns.Add("Full Name", typeof(string));
products.Columns.Add("BirthDate", typeof(string));
int i = 0;
foreach (var item in GetData())
{
// Primary user info
products.Rows.Add(i, item.FullName, item.BirthDate);
i++;
// HERE I HAVE TO GO TO NEXT LIGNE NUMBER AND WRITE TWO COLUMNS AND ROWS INFO
//products.NewRow();
products.Columns.Add("ORDER #", typeof(int));
products.Columns.Add("BUY DATE", typeof(string));
foreach (var order in item.OrderViewModels)
{
products.Rows.Add(order.Id, order.CreatedDate);
//products.NewRow();
// THEN, I HAVE TO DO THE SAME THING. GO TO THE NEXT LIGNE AND ADD TWO OTHERS COLUMNS WITH INFO
products.Columns.Add("PRODUCT NAME", typeof(string));
products.Columns.Add("Quantity", typeof(int));
foreach (var osku in order.OrderSKUViewModels)
{
products.Rows.Add(osku.SKUName, osku.Quantity);
}
}
}
using (ExcelPackage pck = new ExcelPackage())
{
ExcelWorksheet ws = pck.Workbook.Worksheets.Add("FIRST TAB NAME");
ws.Cells["A1"].LoadFromDataTable(products, true)
// Write it back to the client
Response.ContentType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet";
Response.AddHeader("content-disposition", "attachment; filename=ExcelDemo.xlsx");
Response.BinaryWrite(pck.GetAsByteArray());
Response.Flush();
Response.End();
}
At first, I was thinking the products.NewRow() will do the job but it's not the case. I don't find an easy we to do what that easy operation. I would like to avoid to do major changes to the code for something like that o_O
Anyone can help ?
David
Edit
I can't add a picture but I can share the HTML code I use to display the data. I try to reproduce the same architecture with Excel. So we can understand that I thought that product.NewLine() could produce a kind of line change... :
<table class="table table-hover" style="border: 1px solid #ddd;">
#foreach (var item in Model)
{
<tr class="success">
<th style="width:122px;">Full Name</th>
<th style="width:118px;">BirthDate</th>
</tr>
<tr class="active">
<td>#item.FullName</td>
<td>#item.BirthDate</td>
</tr>
foreach (var order in item.OrderViewModels)
{
<tr>
<td><b>Commande:</b></td>
<td><b>#</b> #order.Id</td>
<td colspan="6"><b>Buy date:</b> #order.CreatedDate</td>
</tr>
foreach (var osku in order.OrderSKUViewModels)
{
<tr>
<td colspan="3"><b>Product name:</b> #osku.SKUName</td>
<td colspan="5"><b>Quantity:</b> #osku.Quantity</td>
</tr>
}
<tr>
#if (order.GuestViewModels.Count() != 0)
{
<th colspan="8" class="success">Guests</th>
}
</tr>
if (order.GuestViewModels.Count() > 0)
{
<tr class="warning">
<th style="width:122px;">FullName</th>
<th style="width:118px;">BirthDate</th>
</tr>
}
foreach (var guest in order.GuestViewModels)
{
<tr style="border-top: 2px double #0094ff;border-left:2px solid #0094ff;border-right:2px solid #0094ff">
<td>#guest.FullName</td>
<td>#guest.BirthDate</td>
</tr>
}
}
}
</table>

htmlAgilityPack parse table to datatable or array

I have these tables:
<table>
<tbody>
<tr><th>Header 1</th></tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<th>Header 1</th>
<th>Header 2</th>
<th>Header 3</th>
<th>Header 4</th>
<th>Header 5</th>
</tr>
<tr>
<td>text 1</td>
<td>text 2</td>
<td>text 3</td>
<td>text 4</td>
<td>text 5</td>
</tr>
</tbody>
</table>
I am trying to transform into an array or List using this code:
var query = from table in doc.DocumentNode.SelectNodes("//table").Cast<HtmlNode>()
from row in table.SelectNodes("tr").Cast<HtmlNode>()
from header in row.SelectNodes("th").Cast<HtmlNode>()
from cell in row.SelectNodes("td").Cast<HtmlNode>()
select new {
Table = table.Id,
Row = row.InnerText,
Header = header.InnerText,
CellText = cell.InnerText
};
But it doesn't work. What is wrong?
Some notes:
You do not need a cast
you are assuming that each row have headers
SelectNodes needs to receive an xpath and you are passing just names
if i were you i would use a foreach and model my data, that way i get to have more control and efficiency, but if you still want to do it your way this is how it should be
var query = from table in doc.DocumentNode.SelectNodes("//table")
where table.Descendants("tr").Count() > 1 //make sure there are rows other than header row
from row in table.SelectNodes(".//tr[position()>1]") //skip the header row
from cell in row.SelectNodes("./td")
from header in table.SelectNodes(".//tr[1]/th") //select the header row cells which is the first tr
select new
{
Table = table.Id,
Row = row.InnerText,
Header = header.InnerText,
CellText = cell.InnerText
};

Categories