htmlAgilityPack parse table to datatable or array - c#

I have these tables:
<table>
<tbody>
<tr><th>Header 1</th></tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<th>Header 1</th>
<th>Header 2</th>
<th>Header 3</th>
<th>Header 4</th>
<th>Header 5</th>
</tr>
<tr>
<td>text 1</td>
<td>text 2</td>
<td>text 3</td>
<td>text 4</td>
<td>text 5</td>
</tr>
</tbody>
</table>
I am trying to transform into an array or List using this code:
var query = from table in doc.DocumentNode.SelectNodes("//table").Cast<HtmlNode>()
from row in table.SelectNodes("tr").Cast<HtmlNode>()
from header in row.SelectNodes("th").Cast<HtmlNode>()
from cell in row.SelectNodes("td").Cast<HtmlNode>()
select new {
Table = table.Id,
Row = row.InnerText,
Header = header.InnerText,
CellText = cell.InnerText
};
But it doesn't work. What is wrong?

Some notes:
You do not need a cast
you are assuming that each row have headers
SelectNodes needs to receive an xpath and you are passing just names
if i were you i would use a foreach and model my data, that way i get to have more control and efficiency, but if you still want to do it your way this is how it should be
var query = from table in doc.DocumentNode.SelectNodes("//table")
where table.Descendants("tr").Count() > 1 //make sure there are rows other than header row
from row in table.SelectNodes(".//tr[position()>1]") //skip the header row
from cell in row.SelectNodes("./td")
from header in table.SelectNodes(".//tr[1]/th") //select the header row cells which is the first tr
select new
{
Table = table.Id,
Row = row.InnerText,
Header = header.InnerText,
CellText = cell.InnerText
};

Related

How to find last column of a table using Html Agility Pack

I have a table like this:
<table border="0" cellpadding="0" cellspacing="0" id="table2">
<tr>
<th>Name
</th>
<th>Age
</th>
</tr>
<tr>
<td>Mario
</td>
<th>Age: 78
</td>
</tr>
<tr>
<td>Jane
</td>
<td>Age: 67
</td>
</tr>
<tr>
<td>James
</td>
<th>Age: 92
</td>
</tr>
</table>
I want to get the last td from all rows using Html Agility Pack.
Here is my C# code so far:
await page.GoToAsync(NumOfSaleItems, new NavigationOptions
{
WaitUntil = new WaitUntilNavigation[] { WaitUntilNavigation.DOMContentLoaded }
});
var html4 = page.GetContentAsync().GetAwaiter().GetResult();
var htmlDoc4 = new HtmlDocument();
htmlDoc4.LoadHtml(html4);
var SelectTable = htmlDoc4.DocumentNode.SelectNodes("/html/body/div[2]/div/div/div/table[2]/tbody/tr/td[1]/div[3]/div[2]/div/table[2]/tbody/tr/td[4]");
if (SelectTable.Count == 0)
{
continue;
}
else
{
foreach (HtmlNode row in SelectTable)//
{
string value = row.InnerText;
value = value.ToString();
var firstSpaceIndex = value.IndexOf(" ");
var firstString = value.Substring(0, firstSpaceIndex);
LastSellingDates.Add(firstString);
}
}
How can I get only the last column of the table?
I think the XPath you want is: //table[#id='table2']//tr/td[last()].
//table[#id='table2'] finds the table by ID anywhere in the document. This is preferable to a long brittle path from the root, since a table ID is less likely to change than the rest of the HTML structure.
//tr gets the descendent rows in the table. I'm using two slashes in case there might be an intervening <tbody> element in the actual HTML.
/td[last()] gets the last <td> in each row.
From there you just need to select the InnerText of each <td>.
var tds = htmlDoc.DocumentNode.SelectNodes("//table[#id='table2']//tr/td[last()]");
var values = tds?.Select(td => td.InnerText).ToList() ?? new List<string>();
Working demo here: https://dotnetfiddle.net/7I8yk1

How to Retrieve Specific table from HTML FILE in c#?

I have an HTML file that contains many tables, but I want to access a specific table from the file (not all tables).
So how can I do that?
Code is look something like below and all tables are without ids
`<table border=1>
<tr><td>VI not loadable</td><td>0</td></tr>
<tr><td>Test not loadable</td><td>0</td></tr>
<tr><td>Test not runnable</td><td>0</td></tr>
<tr><td>Test error out</td><td>0</td></tr>
</table>`
every table should have an Id or something that could be Identified from the others, if so you can get it via jquery. for example :
<table class="table table-striped" id="tbl1">
<thead>
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Email</th>
</tr>
</thead>
<tbody>
<tr>
<td>John</td>
<td>Doe</td>
<td>john#example.com</td>
</tr>
<tr>
<td>Mary</td>
<td>Moe</td>
<td>mary#example.com</td>
</tr>
<tr>
<td>July</td>
<td>Dooley</td>
<td>july#example.com</td>
</tr>
</tbody>
and get it like this:
var table = $('#tbl1').html();
if not you can find it by its priority in the file. for example you can access to 2nd table like this :
var table = $('table:nth-child(2)')
or in C# maybe this would help:
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[1]")
foreach (var cell in table.SelectNodes(".//tr/td"))
{
string someVariable = cell.InnerText
}

Cannot find table using c#

I trying to change value of a particular cell in a table, but i am getting an error, can anyone see what the problem is please :
Test WebTable.IWebTableTests.testIWebTableTests failed: System.NullReferenceException : Object reference not set to an instance of an object.
WebTableTests.cs(37,0): at WebTable.IWebTableTests.testIWebTableTests()
<div class="cart-info">
<table border="1">
<thead>
<tr>
<td class="name">Product Name</td>
<td class="model">Model</td>
<td class="quantity">Quantity</td>
<td class="price">Unit Price</td>
<td class="total">Total</td>
</tr>
</thead>
<tbody>
</table>
</div>
and:
IWebElement table = driver.FindElement(By.CssSelector(".cart-info>table"));
ReadOnlyCollection<IWebElement> allRows = table.FindElements(By.TagName("tr"));
ReadOnlyCollection<IWebElement> allCols = table.FindElements(By.TagName("td"));
//Verify that it has three rows
Assert.AreEqual(3, allRows.Count);
//Verify that it has six columns
Assert.AreEqual(5, allCols.Count);
//Verify that specified value exists in second cell of third row
Assert.AreEqual("iPhone", allRows[3].FindElements(By.TagName("td"))[1].Text);
//Get in cell editor and enter some value
string cellValue = allRows[3].FindElements(By.TagName("td"))[3].Text;
IWebElement cellEdit = allRows[3].FindElements(By.TagName("td"))[3];
cellEdit.Clear();
cellEdit.SendKeys("2");
string aftercellValue = allRows[3].FindElements(By.TagName("td"))[3].Text;

get a specific row from the html with a specific word using regular expression

I want to fetch all rows having a specific word/string in its.. and store it in array
I have a string as below
<tr>
<td>Total</td>
<td>123</td>
<td>567</td>
</tr>
<tr>
<td>ABC</td>
<td>123</td>
<td>567</td>
</tr>
<tr>
<td>XYZ</td>
<td>123</td>
<td>567</td>
</tr>
<tr>
<td>Total</td>
<td>7676</td>
<td>8767</td>
</tr>
I want to fetch a row having the string Total and the value of should store in array
So output should
<tr>
<td>Total</td>
<td>123</td>
<td>567</td>
</tr>
<tr>
<td>Total</td>
<td>7676</td>
<td>8767</td>
</tr>
what should be the regular expression to fetch a row with a string "Total"
To build arrays for each table row that has a cell with the word "Total", you could use this regex:
(?<=<tr>\s*<td>Total</td>)(\s*<td>\d+</td>)+(?=\s*</tr>)
Which would give you the following 2 matches:
<td>123</td>
<td>567</td>
and
<td>7676</td>
<td>8767</td>
On these matches you could then split with this regex to get arrays in return:
\D+
IN JQUERY UR SOLUTION WILL--->
var tbl = $('#tblId')
var array = [];
$('tr td' ,tbl).each(function(){
var htmlstring = (this).innerHTML;
if(htmlstring == 'Total')
{
if((this).innerHTML == 'Total')
{
$('td', this.parentNode).each(function(){
array.push(this);
});
}
}
});
alert(array);
http://jsfiddle.net/gXGj6/13/
Good solution using jQuery:
http://jsfiddle.net/robfarmer/KaGBL/2/
HTML:
Source
<table id="source">
<tr>
<td>Total</td>
<td>123</td>
<td>567</td>
</tr>
<tr>
<td>ABC</td>
<td>123</td>
<td>567</td>
</tr>
<tr>
<td>XYZ</td>
<td>123</td>
<td>567</td>
</tr>
<tr>
<td>Total</td>
<td>7676</td>
<td>8767</td>
</tr>
</table>
Results
<table id="results"></table>
Array Results:
<ul id="arrayResults"/>
Javascript
$(document).ready(function() {
$("#source tr td:contains('Total')").closest("tr").clone().appendTo("#results");
var cells = [];
$("#source tr td:contains('Total')").closest("tr")
.children("td").not(":contains('Total')").each(function(index, element) {
cells.push($(element).text());
});
$(cells).each(function(index, element) {
$("#arrayResults").append($("<li>").text(element));
});
});

Passing tables from C# code behind to a javascript function

I have a user control that is part of an update panel. The user control is a heading( tag) and a asp Table. The asp:Table is defined in the ascx file only with the headers. The contents of this table are updated dynamically from the code behind by reading a csv file. This set up is within an Update panel which updates every minute. After every minute the csv file gets updated and hence the table needs to be updated.
Here is the tricky part. Before the table is I updated I need to save a copy of the old table and then update the new table. Once the new table is updated and the page is about to be loaded I need to call a javascript function from in the page_load handler and pass the two tables. Inside the javascript function I need to compare the old table and the new table cell by cell and do some work based on the result of the comparison.
Here is how I copy the data from the table to another table before updating it.
TableCell tableCell;
TableRow tableRow;
for (int i = 0; i < Table1.Rows.Count; i++)
{
tableRow = new TableRow();
for (int j = 0; j < Table1.Rows[i].Cells.Count; j++)
{
tableCell = new TableCell();
tableCell.Text = Table1.Rows[i].Cells[j].Text;
tableRow.Cells.Add(tableCell);
}
oldTable.Rows.Add(tableRow);
}
But for some reason when I pass the tables to the javascript function and access the cells in javascript I see only the headers and not any values in the old table. But when I access the cells in my code behind itself I can see the values.
My HTML is
<table id="ContentPlaceHolderBody_ContentPlaceHolderBody_TradxPriceTable_5_oldTable" ClientID="oldTable">
<tr>
<td colspan="3">6M EURIBOR</td>
<td colspan="3">6M EURIBOR</td>
</tr><tr>
<td>Instr</td>
<td>Bid</td>
<td>Ask</td>
<td>Instr</td>
<td>Bid</td>
<td></td>
</tr><tr>
<td></td>
<td></td>
<td></td>
<td colspan="3">BASIS 3s6s</td>
</tr><tr>
<td></td>
<td></td>
<td></td>
<td>Instr</td>
<td>Bid</td>
<td>Ask</td>
</tr><tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</table>
My javascript is
function foo(table1,table2)
{
var oldTable = document.getElementById(table1);
var newTable = document.getElementById(table2);
alert(oldTable.rows[2].cells[1].innerHTML+" "+newTable.rows[2].cells[1].innerHTML);
}
use html tables and form a string with the tables and the values and pass that string to javascript.
string newTable = "<Table><Tr><Td>"+ somevalue +"</Td></Tr></Table>"
in javascript
var newT = '<%= newTable %>'

Categories