Parsing HTML tables with different row numbers - c#

I am trying to parse HTML tables, but the tables are not equal in rows with different row numbers, all tables under (form) I am selecting the (form) as SingleNode, but the (tbody) came the row not (td), I can't loop for all (td).
Part of the HTML code:
<form name="DetailsForm" method="post" action="">
<input type="hidden" name="helpPageId" value="WF03">
<input type="hidden" name="withMenu" value="1">
<table width="100%" cellspacing="0" border="0">
<tbody>
<tr valign="center">
<td class="blackHeadingLeft">Details</td>
</tr>
<tr></tr>
<tr>
<td></td>
</tr>
</tbody>
</table>
<table width="100%" cellspacing="0" border="0">
<tbody>
<tr>
<td class="whiteTd" height="21"> AWB:</td>
<td class="whiteTdNormal" nowrap="nowrap" height="21"> 7777995585 </td>
<td class="whiteTd" nowrap="nowrap" height="21"> No of Shipment Details:</td>
<td class="whiteTdNormal" nowrap="nowrap" height="21"> 1 </td>
<td class="whiteTdNormal" width="100%" height="21"> </td>
</tr>
</tbody>
</table>
<table class="bordered-table" width="100%" border="0">
<tbody>
<tr>
<td class="grayTd" width="5%" height="21"> Details</td>
<td class="grayTd" width="5%" height="21" align="center"> Orig</td>
<td class="grayTd" width="8%" height="21" align="center"> Location</td>
<td class="grayTd" width="7%" height="21"> Dest</td>
<td class="grayTd" width="5%" height="21" align="center"> Pcs</td>
<td class="grayTd" width="5%" height="21"> Weight(kg)</td>
<td class="grayTd" width="11%" height="21"> Volumetric Weight(kg)</td>
<td class="grayTd" width="9%" height="21"> Date/Time</td>
<td class="grayTd" width="8%" height="21"> Route/Cycle</td>
<td class="grayTd" width="8%" height="21"> Post Code</td>
<td class="grayTd" width="6%" height="21"> Product</td>
<td class="grayTd" width="9%" height="21"> Amount</td>
<td class="grayTd" width="9%" height="21"> Duplicate</td>
</tr>

Here is the way that I am able to do it:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
{
Console.WriteLine("Table: ");
foreach (HtmlNode tbody in table.SelectNodes("tbody"))
{
if (tbody.ChildNodes.Any(x => x.Name == "tr"))
{
Console.WriteLine("TBody: ");
foreach (HtmlNode cell in tbody.SelectNodes("tr"))
{
Console.WriteLine("TR: ");
if (cell.ChildNodes.Any(c => c.Name == "td"))
{
foreach (var item in cell.SelectNodes("td"))
{
Console.WriteLine("TD: ");
Console.WriteLine(item.InnerHtml);
}
}
Console.WriteLine();
}
}
}
}
This way it doesn't matter how many tr or td tags there are. One thing to note is that you have to add validation if there is a case in which there are no tr or td tags in the tbody.
I hope this helps.
Edited to include validation for tr and td tags. A similar logic can be used for all other tags that might be missing.

Related

C# Get all the id of the html tag and set inner text for <td></td> tag

I have string html, I want to get all id name of tag in string html.
get string html in file text:
<tr>
<td class="X8">
</td>
<td colspan="6" class="X9"></td>
<td colspan="4" class="X12" id="closedate">
</td>
<td colspan="6" class="X9"></td>
<td colspan="4" class="X12" id="startdate">
</td>
<td class="X8">
</td>
<td class="X8" colspan="3">
</td>
<td class="X8">
</td>
<td colspan="9" class="X9"></td>
<td colspan="6" class="X15" id="totalpayment"></td>
<td class="X8">
</td>
<td class="X8">
</td>
</tr>
<tr>
<td class="X17">
</td>
<td class="X17" colspan="8">
</td>
<td class="X17" colspan="33">
</td>
<td class="X17">
</td>
</tr>
<tr>
<td class="X17">
</td>
<td class="X17" colspan="8">
<td class="X17" colspan="16">
</td>
<td class="X17">
</td>
<td colspan="9" class="X20"></td>
<td colspan="6" class="X23" id="approvaldate"></td>
<td class="X17">
</td>
<td class="X17">
</td>
</tr>
expected results:
closedate, startdate,totalpayment, approvaldate.
Then I want to set inner text for id name tag
(Ex:<td colspan="6" class="X23" id="approvaldate">2018/07/18</td>)
Using c#.Help me, please. Thanks a lot.
What I am understood from your question is you need the id of all in string simple Example Created for you
<form id="form1" runat="server">
<input id="Name" type="text" name="Full Name" runat="server" />
<input id="Email" type="text" name="Email Address" runat="server" />
<input id="Phone" type="text" name="Phone Number" runat="server" />
</form>
foreach (var control in Page.Form.Controls)
{
if (control is HtmlInputControl)
{
var htmlInputControl = control as HtmlInputControl;
string controlName = htmlInputControl.Name;
string controlId = htmlInputControl.ID;
}
}
Another Approach:-
HtmlElement table = testWebBrowser.Document.GetElementById("TableID");
if (table != null)
{
foreach (HtmlElement row in table.GetElementsByTagName("TR"))
{
// ...
}
}

Convert a HTML Table with rowspans to DataTable C#

I need to convert a Html Table to DataTable in C#. I used HtmlAgilityPack but it does not convert it well because of rowspans.
The code I am currently using is:
private static DataTable convertHtmlTableToDataTable()
{
WebClient webClient = new WebClient();
string urlContent = webClient.DownloadString("http://example.com");
string tableCode = getTableCode(urlContent);
string htmlCode = tableCode.Replace(" ", " ");
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(htmlCode);
var headers = doc.DocumentNode.SelectNodes("//tr/th");
DataTable table = new DataTable();
foreach (HtmlNode header in headers)
{
table.Columns.Add(header.InnerText);
}
foreach (var row in doc.DocumentNode.SelectNodes("//tr[td]"))
{
table.Rows.Add(row.SelectNodes("td").Select(td => td.InnerText).ToArray());
}
return table;
}
And this is a part of Html Table:
<table class="tabel" cellspacing="0" border="0">
<caption style="font-family:Verdana; font-size:20px;">SEMGRP</caption>
<tr>
<th class="celula" >Ora</th>
<th class="latime_celula celula">Luni</th>
<th class="latime_celula celula">Marti</th>
<th class="latime_celula celula">Miercuri</th>
<th class="latime_celula celula">Joi</th>
<th class="latime_celula celula">Vineri</th>
</tr>
<tr>
<td class="celula" nowrap="nowrap">8-9</td>
<td class="celula" rowspan="2">
<table border="0" align="center">
<tr>
<td nowrap="nowrap" align="center">
Curs
<br />
<a class="link_celula" href="afis_n0.php?id_tip=287&tip=p">Prof</a>
<br />
<a class="link_celula" href="afis_n0.php?id_tip=9&tip=s">Sala</a>
<br />
</td>
</tr>
</table>
</td>
<td class="celula" rowspan="2">
<table border="0" align="center">
<tr>
<td nowrap="nowrap" align="center">
Curs
<br />
<a class="link_celula" href="afis_n0.php?id_tip=287&tip=p">Prof</a>
<br />
<a class="link_celula" href="afis_n0.php?id_tip=12&tip=s">Sala</a>
<br />
</td>
</tr>
</table>
</td>
<td class="celula"> </td>
<td class="celula"> </td>
<td class="celula" rowspan="2">
<table border="0" align="center">
<tr>
<td nowrap="nowrap" align="center">
Curs
<br />
<a class="link_celula" href="afis_n0.php?id_tip=293&tip=p">Prof</a>
<br />
<a class="link_celula" href="afis_n0.php?id_tip=9&tip=s">Sala</a>
<br />
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td class="celula" nowrap="nowrap">9-10</td>
<td class="celula"> </td>
<td class="celula"> </td>
</tr>
<tr>
<td class="celula" nowrap="nowrap">10-11</td>
<td class="celula" rowspan="2">
<table border="0" align="center">
<tr>
<td nowrap="nowrap" align="center"> Curs
<br /><a class="link_celula" href="afis_n0.php?id_tip=303&tip=p">Prof</a>
<br /><a class="link_celula" href="afis_n0.php?id_tip=9&tip=s">Sala</a>
<br />
</td>
</tr>
</table>
</td>
<td class="celula" rowspan="2">
<table border="0" align="center">
<tr>
<td nowrap="nowrap" align="center"> Curs
<br />
<a class="link_celula" href="afis_n0.php?id_tip=331&tip=p">Prof</a>
<br />
<a class="link_celula" href="afis_n0.php?id_tip=14&tip=s">Sala</a>
<br />
</td>
</tr>
</table>
</td>
<td class="celula" rowspan="2">
<table border="0" align="center">
<tr>
<td nowrap="nowrap" align="center"> Curs
<br /><a class="link_celula" href="afis_n0.php?id_tip=330&tip=p">Prof</a>
<br /><a class="link_celula" href="afis_n0.php?id_tip=9&tip=s">Sala</a>
<br />
</td>
</tr>
</table>
</td>
<td class="celula"> </td>
<td class="celula" rowspan="2">
<table border="0" align="center">
<tr>
<td nowrap="nowrap" align="center"> Curs
<br />
<a class="link_celula" href="afis_n0.php?id_tip=293&tip=p">Prof</a>
<br />
<a class="link_celula" href="afis_n0.php?id_tip=10&tip=s">Sala</a> <br />
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td class="celula" nowrap="nowrap">11-12</td>
<td class="celula"> </td>
</tr>
<tr>
I tried some solutions but I did not find any good...
Thanks for any help in advance.

Xpath select all tr without table with id=x

Hello i need to select all tr,but in some tr i have a table with id=WHITE_BANKTABLE.
I need to select only Tr that dont't have this table with id.
My html
<table id=mytable_body>
<TR id=TR_ROW_BANKTABLE class=TR_ROW_BANKTABLE style="BACKGROUND-COLOR: #f6f8fa" align=right bgColor=#f6f8fa>
<TD noWrap align=right w_idth="190"> </TD>
<TD align=right>010073/15922</TD>
</TR>
> **//This Tr with TABLE id=WHITE_BANKTABLE i don't need**
<TR>
<TD colSpan=8 align=center>
<TABLE id=WHITE_BANKTABLE cellSpacing=0 borderColorDark=#edf0f5 cellPadding=3 width="100%" bgColor=white borderColorLight=#edf0f5 border=1 isWhiteTable="Y">
<TBODY>
<TR class=TR_BANKTABLE align=right vAlign=top>
<TD> sdfsd </TD>
<TD>sdfs</TD>
</TR>
</TBODY>
</TABLE>
</TD>
</TR>
<TR id=TR_ROW_BANKTABLE class=TR_ROW_BANKTABLE style="BACKGROUND-COLOR: #f6f8fa" align=right bgColor=#f6f8fa>
<TD noWrap align=right w_idth="190"> </TD>
<TD align=right>010073/15922</TD>
</TR>
</table>
Thanx.
Assuming the above is correctly formatted as XML (insert missing double quotes):
var q =
xml.XPathSelectElements(#"/tr[not(descendant::table[#id = 'WHITE_BANKTABLE'])]");

How do I keep the existing filename in the database when updating a page?

I have an page called Update.aspx, and in it I am updating all values instead of filename. When I am going to update info, all values are fetched from the database to corresponding textboxes. But if I have a filename such as abc.jpg in the database and I go on the update page without browsing the file and update, it's inserted as null.
How can I keep the same file (i.e. abc.jpg) in database even though it's not updated by user?
This is my code:
string onlyname = string.Empty;
string filename = "";
string path = "";
if (FileUp.HasFile)
{
filename = FileUp.PostedFile.FileName;
path = Server.MapPath("/clients_images/") + FileUp.FileName;
onlyname = path.Substring(path.LastIndexOf("\\") + 1);
FileUp.SaveAs(path);
}
else
{
onlyname = dr["Client_Logo"].ToString().Trim();
}
param[4] = new SqlParameter("#Client_Logo", SqlDbType.VarChar, 200);
param[4].Value = onlyname.ToString().Trim();
cmd.Parameters.Add(param[4]);
ASPX code:
<table align="center" cellpadding="3" cellspacing="0" border="0" width="638" class="tableborder">
<tr>
<td colspan="2" class="mainbg" align="center" height="13">
<font class="general">Update Client Detail</font></td>
</tr>
<tr>
<td class="general" width="50%" align="right">
<b>Client Name :</b></td>
<td align="left">
<asp:TextBox ID="txtUpdateclientname" Text="" Columns = "30" CssClass="checkbox02" runat="server" onchange="Javascript: return initialCap(this);"></asp:TextBox>
<font class="mandatory">*</font>
</td>
</tr>
<tr>
<td class="general" width="50%" align="right">
<b>Company Name :</b></td>
<td align="left">
<asp:TextBox ID="updatecompanyname" Text="" Columns = "30" CssClass="checkbox02" runat="server" onchange="Javascript: return initialCap(this);"></asp:TextBox>
<font class="mandatory">*</font>
</td>
</tr>
<tr>
<td class="general" width="50%" align="right">
<b>Email :</b></td>
<td align="left">
<asp:TextBox ID="updateemail" Text="" Columns = "30" CssClass="checkbox02" runat="server"></asp:TextBox>
<font class="mandatory">*</font>
</td>
</tr>
<tr>
<td class="general" width="50%" valign="top" align="right">
<b>Logo :</b></td>
<td class="general" width="50%" align="left">
<asp:FileUpload ID="FileUp" runat="server" CssClass="checkbox02" />
<br /> (Height of image should not be more than 67 And Width of image should not be more than 380)
</td>
</tr>
<tr>
<td class="general" width="50%" align="right">
<%--<b><font class="bluetext"><strong> Upload : </strong>--%></td>
<td align="left">
<asp:LinkButton ID="viewImage" Text="View Existing" runat="server" cssClass="general-white" class="link" OnCommand="LinkButton_Command" CommandName="downloadfile" ToolTip="Click to View Existing" Font-Bold="true"></asp:LinkButton>
</td>
</tr>
<tr>
<td class="general" width="50%" align="right" valign="top">
<b>Project Type :</b></td>
<td width="100%" align="left" valign="top" colspan="2">
<table cellpadding="0" cellspacing="0" border="0">
<tr>
<td align="right" valign="top" colspan="3">
<asp:ListBox ID="LstUserLeft" Width="100%" Autoscroll="false" SelectionMode="Multiple" Height="150" CssClass="checkbox02"
runat="server">
</asp:ListBox>
</td>
</tr>
</table>
</td>
</tr>
<tr>
</tr>
</table>
You can use HiddenField or ViewState to store the filename from the Client_logo field (from the database).
For instance.
Assign the value to the HiddenField1 while retrieving result,
HiddenField1.Value=dr["Client_Logo"].ToString().Trim();
and while updating,
if (FileUp.HasFile)
{
...
}
else
{
onlyname = HiddenField1.Value;
}

regular expression - find and split links in a table

another question of me about regular expression, its so complicated for me :S So I'm happy for an additional help.
I have a table and I like to read all links inside this table and split it to groups.
The goal should be
Person 1
Status of person 1
Person 2
Status of Person 2
So i have to get the values inside the links in this table
<a class="darklink" href="testlink">Person 2, - Status of Person 2</a>
Is it possible to search just in a table which has a specific Tag before? like this
<p>title</p>(because there are other similar tables at my site)
<p>title</p>
<table cellspacing="0" cellpadding="0" border="0" width="95%">
<tbody>
<tr>
<td bgcolor="#999999" colspan="2"><img height="1" border="0" width="1" src="images/dot_transp.gif" alt=" "/> </td>
</tr>
<tr>
<td><a class="darklink" href="asdfer">Person1, - Status of Person1 </a> </td>
<td valign="bottom"></td>
</tr>
<tr>
<td bgcolor="#999999" colspan="2"><img height="1" border="0" width="1" src="images/dot_transp.gif" alt=" "/> </td>
</tr>
<tr>
<td><a class="darklink" href="aeraseraesr">Person 2, - Status of Person 2</a></td>
<td valign="bottom"> <img hspace="0" height="16" border="0" align="right" width="12" vspace="0" alt=" " src="images/ico_link.gif"/> </td>
</tr>
<tr>
<td bgcolor="#999999" colspan="2"><img height="1" border="0" width="1" src="images/dot_transp.gif" alt=" "/> </td>
</tr>
<tr>
<td><a class="darklink" href="asdfasdf">Person 3. - Status of Person 3</a></td>
<td valign="bottom"> </td>
</tr>
<tr> </tr>
</tbody>
</table>
Your regular expression should be:
<a class="darklink" .*?>(.*?). - (.*?)</a>
or if you get line breaks inside your <a> tag:
<a class="darklink" [\s\S]*?>([\s\S]*?). - *([\s\S]*?)</a>
So, following code should works:
Regex person = new Regex(#"<a class=""darklink"" .*?>(.*?). - (.*?)</a>");
foreach (Match m in person.Matches(input))
{
Console.WriteLine("First group : {0}", m.Groups[1]);
Console.WriteLine("Second group: {0}", m.Groups[2]);
};

Categories