C# convert HTML to text preserving the next line [closed] - c#

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have an HTML saved in a.txt file which looks like this.
<HTML> <HEAD> <TITLE></TITLE> </HEAD>
<BODY STYLE="font: 10pt Times New Roman, Times, Serif"> <P STYLE="margin: 0"></P> <P STYLE="font: 10pt Times New Roman, Times, Serif; margin: 0pt 0; text-align: center">UNITED STATES</P> <P STYLE="font: 10pt Times New Roman, Times, Serif; margin: 0pt 0; text-align: center">SECURITIES AND EXCHANGE COMMISSION</P> <P STYLE="font: 10pt Times New Roman, Times, Serif; margin: 0pt 0; text-align: center">WASHINGTON, D.C. 20549</P>
<P STYLE="font: 10pt Times New Roman, Times, Serif; margin: 0pt 0; text-align: center"> </P> <P STYLE="font: 10pt Times New Roman, Times, Serif; margin: 0pt 0; text-align: center"></P> <P STYLE="font: 10pt Times New Roman, Times, Serif; margin: 0pt 0; text-align: center"><B> </B></P>
<TABLE CELLSPACING="0" CELLPADDING="0" STYLE="font: 10pt Times New Roman, Times, Serif; width: 100%; border-collapse: collapse"> <TR STYLE="vertical-align: top"> <TD STYLE="width: 5%; padding-right: 5.4pt; padding-left: 5.4pt"><FONT STYLE="font-size: 10pt">[X]</FONT></TD> <TD STYLE="width: 95%; padding-right: 5.4pt; padding-left: 5.4pt"><FONT STYLE="font-size: 10pt">ANNUAL REPORT UNDER SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934</FONT></TD></TR> <TR STYLE="vertical-align: top">
<TD STYLE="padding-right: 5.4pt; padding-left: 5.4pt"></TD>
<TD STYLE="padding-right: 5.4pt; padding-left: 5.4pt"> </TD></TR> <TR STYLE="vertical-align: top"> <TD STYLE="padding-right: 5.4pt; padding-left: 5.4pt"></TD>
<TD STYLE="padding-right: 5.4pt; padding-left: 5.4pt; text-align: right"><FONT STYLE="font-size: 10pt">For the fiscal year ended <B><U>October 31, 2012</U></B></FONT></TD></TR> <TR STYLE="vertical-align: top"> <TD STYLE="padding-right: 5.4pt; padding-left: 5.4pt"></TD> <TD STYLE="padding-right: 5.4pt; padding-left: 5.4pt"> </TD></TR> <TR STYLE="vertical-align: top"> <TD STYLE="padding-right: 5.4pt; padding-left: 5.4pt"><FONT STYLE="font-size: 10pt">[ ]</FONT></TD> <TD STYLE="padding-right: 5.4pt; padding-left: 5.4pt"><FONT STYLE="font-size: 10pt">TRANSITION REPORT UNDER SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934</FONT></TD></TR> <TR STYLE="vertical-align: top">
<TD STYLE="padding-right: 5.4pt; padding-left: 5.4pt"></TD> <TD STYLE="padding-right: 5.4pt; padding-left: 5.4pt"> </TD></TR> <TR STYLE="vertical-align: top">
<TD STYLE="padding-right: 5.4pt; padding-left: 5.4pt"></TD> <TD STYLE="padding-right: 5.4pt; padding-left: 5.4pt; text-align: right"><FONT STYLE="font-size: 10pt">For the transition period from _________ to ________</FONT></TD></TR>
I need text which preserves Newline. All these text are getting combined into a single line. How to handle this? Below is my C# code
string text = File.ReadAllText(#"C:\a.txt",Encoding.UTF8);
Regex regex = new Regex("<[^>]+>");
text = regex.Replace(text, " ").Replace("( )+", Environment.NewLine).Replace(" ", "").Replace("’", "'").Replace("\r\n\r\n(\r\n)+", Environment.NewLine);
text = HttpUtility.HtmlDecode(text);
Console.WriteLine(text);

I would never use regex to parse HTML, instead, use the HtmlAgilityPack, you can do a lot of things just using simple XQuery/XPath, example:
HtmlDocument doc = new HtmlDocument();
doc.Load(#"C:\temp\stackoverflow\question23657841\question23657841\a.html");
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//p"))
{
Console.WriteLine(node.InnerHtml);
}
The output is:
UNITED STATES
SECURITIES AND EXCHANGE COMMISSION
WASHINGTON, D.C. 20549
<b> </b>
And simply switching the XQuery to //font you get this:
[X]
ANNUAL REPORT UNDER SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934
For the fiscal year ended <b><u>October 31, 2012</u></b>
[ ]
TRANSITION REPORT UNDER SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934
For the transition period from _________ to ________

Why not read File line by line File.ReadAllLines() does just that

Related

Outlook and Windows Live Mail show the html table differently (Outlook shows the undesired one, WLM shows the desired)

I'm new to this field.
I have a table in an Excel file (xlsx). Using C#, I save it as html:
`worksheet.SaveToHtml("the html file path destination");`
Then I read the html file:
`String htmlCode = File.ReadAllText("the html file path", Encoding.Default);`
Then I just put it to the email body like this:
mail.Body = "<BR/><font size=2 ...." + htmlCode + ".... </body>";
The table is supposed to look this way:
But when the email is opened in Outlook, it looks like this:
The columns are significantly smaller in width.
How to fix this? I don't want to directly code the html in the C#.
UPDATE
This is how the html using WorkSheet.SaveToHtml looks like. I need to find what to change/add to make this table look right in Outlook.
<html>
<head>
<style type="text/css">table{border-collapse:collapse;table-layout:fixed;border-spacing:0;empty-cells:show}
.X0{color:rgb(0,32,96);font-family:Calibri;font-size:11pt;background-color:rgb(218,238,243);border-top-color:000000;border-top-style:solid;border-top-width:2;border-bottom-color:000000;border-bottom-style:solid;border-bottom-width:2;border-left-color:000000;border-left-style:solid;border-left-width:2;border-right-color:000000;border-right-style:solid;border-right-width:2;font-weight:bold;vertical-align:center;text-align:center;word-wrap:break-word;height:21;}
.X1{color:rgb(0,32,96);font-family:Calibri;font-size:11pt;background-color:rgb(218,238,243);border-right-style:solid;border-right-width:2;border-right-color:000000;border-top-color:000000;border-top-style:solid;border-top-width:2;border-bottom-color:000000;border-bottom-style:solid;border-bottom-width:2;border-left-color:000000;border-left-style:solid;border-left-width:2;font-weight:bold;vertical-align:center;text-align:center;word-wrap:break-word;height:21;}
.X2{color:rgb(0,32,96);font-family:Calibri;font-size:11pt;background-color:rgb(218,238,243);border-top-color:000000;border-top-style:solid;border-top-width:2;border-bottom-color:000000;border-bottom-style:solid;border-bottom-width:2;border-right-color:000000;border-right-style:solid;border-right-width:2;font-weight:bold;vertical-align:center;text-align:center;word-wrap:break-word;height:21;}
.X3{color:rgb(0,32,96);font-family:Calibri;font-size:11pt;background-color:rgb(218,238,243);border-bottom-style:solid;border-bottom-width:2;border-bottom-color:000000;border-top-color:000000;border-top-style:solid;border-top-width:2;border-left-color:000000;border-left-style:solid;border-left-width:2;border-right-color:000000;border-right-style:solid;border-right-width:2;font-weight:bold;vertical-align:center;text-align:center;word-wrap:break-word;height:21;}
.X4{color:rgb(0,32,96);font-family:Calibri;font-size:11pt;background-color:rgb(218,238,243);border-bottom-color:000000;border-bottom-style:solid;border-bottom-width:2;border-left-color:000000;border-left-style:solid;border-left-width:2;border-right-color:000000;border-right-style:solid;border-right-width:2;font-weight:bold;vertical-align:center;text-align:center;word-wrap:break-word;height:21;}
.X5{color:rgb(0,32,96);font-family:Calibri;font-size:11pt;background-color:rgb(218,238,243);border-bottom-color:000000;border-bottom-style:solid;border-bottom-width:2;border-right-color:000000;border-right-style:solid;border-right-width:2;font-weight:bold;vertical-align:center;text-align:center;word-wrap:break-word;height:21;}
.X6{color:rgb(0,32,96);font-family:Calibri;font-size:11pt;background-color:rgb(255,255,255);border-bottom-color:000000;border-bottom-style:solid;border-bottom-width:2;border-left-color:000000;border-left-style:solid;border-left-width:2;border-right-color:000000;border-right-style:solid;border-right-width:2;vertical-align:center;text-align:general;word-wrap:break-word;height:21;}
.X7{color:rgb(0,32,96);font-family:Calibri;font-size:11pt;background-color:rgb(255,255,255);border-bottom-color:000000;border-bottom-style:solid;border-bottom-width:2;border-right-color:000000;border-right-style:solid;border-right-width:2;vertical-align:center;text-align:right;word-wrap:break-word;height:21;}</style>
</head>
<body>
<table cellspacing="0">
<Col width="167" />
<Col width="109" />
<Col width="104" />
<Col width="91" />
<Col width="85" />
<Col width="65" />
<tr>
<td class="X0">
<div style="width:163px !Important;width:167px;" />
</td>
<td COLSPAN="2" class="X1">11 11 11</td>
<td ROWSPAN="2" class="X3">11 11</td>
<td COLSPAN="2" class="X1">11 11</td>
</tr>
<tr>
<td class="X4">
<div style="width:163px !Important;width:167px;" />
</td>
<td class="X5">11</td>
<td class="X5">11</td>
<td class="X5">11</td>
<td class="X5">11</td>
</tr>
<tr>
<td class="X6">a aa</td>
<td class="X7">b</td>
<td class="X7">b</td>
<td class="X7">b</td>
<td class="X7">b</td>
<td class="X7">b</td>
</tr>
<tr>
<td class="X6">c cc ccc</td>
<td class="X7">d</td>
<td class="X7">d</td>
<td class="X7">d</td>
<td class="X7">d</td>
<td class="X7">d</td>
</tr>
<tr>
<td class="X6">e ee eee</td>
<td class="X7">f</td>
<td class="X7">f</td>
<td class="X7">f</td>
<td class="X7">1f</td>
<td class="X7">f</td>
</tr>
</table>
</body>
</html>
Thanks for all the comments.
I finally got the desired table in Outlook using Save As in Excel. Save as type Web Page (htm, html). Both htm and html work.
But by default the table align was center. I just need to change it to left and everything's perfect.
Here's the html table that renders correctly in both Outlook and WLM (I haven't tested it in other email)
<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 14">
<link rel=File-List href="tabel_transaksi_saham_copy_files/filelist.xml">
<style id="tabel_transaksi_saham_copy_879_Styles">
<!--table {
mso-displayed-decimal-separator: "\.";
mso-displayed-thousand-separator: "\,";
}
.xl15879 {
padding-top: 1px;
padding-right: 1px;
padding-left: 1px;
mso-ignore: padding;
color: black;
font-size: 11.0pt;
font-weight: 400;
font-style: normal;
text-decoration: none;
font-family: Calibri, sans-serif;
mso-font-charset: 0;
mso-number-format: General;
text-align: general;
vertical-align: bottom;
mso-background-source: auto;
mso-pattern: auto;
white-space: nowrap;
}
.xl63879 {
padding-top: 1px;
padding-right: 1px;
padding-left: 1px;
mso-ignore: padding;
color: #002060;
font-size: 11.0pt;
font-weight: 700;
font-style: normal;
text-decoration: none;
font-family: Calibri, sans-serif;
mso-font-charset: 0;
mso-number-format: General;
text-align: center;
vertical-align: middle;
border: 1.0pt solid windowtext;
background: #DAEEF3;
mso-pattern: black none;
white-space: normal;
}
.xl64879 {
padding-top: 1px;
padding-right: 1px;
padding-left: 1px;
mso-ignore: padding;
color: #002060;
font-size: 11.0pt;
font-weight: 700;
font-style: normal;
text-decoration: none;
font-family: Calibri, sans-serif;
mso-font-charset: 0;
mso-number-format: General;
text-align: center;
vertical-align: middle;
border-top: none;
border-right: 1.0pt solid windowtext;
border-bottom: 1.0pt solid windowtext;
border-left: 1.0pt solid windowtext;
background: #DAEEF3;
mso-pattern: black none;
white-space: normal;
}
.xl65879 {
padding-top: 1px;
padding-right: 1px;
padding-left: 1px;
mso-ignore: padding;
color: #002060;
font-size: 11.0pt;
font-weight: 700;
font-style: normal;
text-decoration: none;
font-family: Calibri, sans-serif;
mso-font-charset: 0;
mso-number-format: General;
text-align: center;
vertical-align: middle;
border-top: none;
border-right: 1.0pt solid windowtext;
border-bottom: 1.0pt solid windowtext;
border-left: none;
background: #DAEEF3;
mso-pattern: black none;
white-space: normal;
}
.xl66879 {
padding-top: 1px;
padding-right: 1px;
padding-left: 1px;
mso-ignore: padding;
color: #002060;
font-size: 11.0pt;
font-weight: 400;
font-style: normal;
text-decoration: none;
font-family: Calibri, sans-serif;
mso-font-charset: 0;
mso-number-format: General;
text-align: general;
vertical-align: middle;
border-top: none;
border-right: 1.0pt solid windowtext;
border-bottom: 1.0pt solid windowtext;
border-left: 1.0pt solid windowtext;
mso-background-source: auto;
mso-pattern: auto;
white-space: normal;
}
.xl67879 {
padding-top: 1px;
padding-right: 1px;
padding-left: 1px;
mso-ignore: padding;
color: #002060;
font-size: 11.0pt;
font-weight: 700;
font-style: normal;
text-decoration: none;
font-family: Calibri, sans-serif;
mso-font-charset: 0;
mso-number-format: General;
text-align: center;
vertical-align: middle;
border-top: 1.0pt solid windowtext;
border-right: none;
border-bottom: 1.0pt solid windowtext;
border-left: 1.0pt solid windowtext;
background: #DAEEF3;
mso-pattern: black none;
white-space: normal;
}
.xl68879 {
padding-top: 1px;
padding-right: 1px;
padding-left: 1px;
mso-ignore: padding;
color: #002060;
font-size: 11.0pt;
font-weight: 700;
font-style: normal;
text-decoration: none;
font-family: Calibri, sans-serif;
mso-font-charset: 0;
mso-number-format: General;
text-align: center;
vertical-align: middle;
border-top: 1.0pt solid windowtext;
border-right: 1.0pt solid windowtext;
border-bottom: 1.0pt solid windowtext;
border-left: none;
background: #DAEEF3;
mso-pattern: black none;
white-space: normal;
}
.xl69879 {
padding-top: 1px;
padding-right: 1px;
padding-left: 1px;
mso-ignore: padding;
color: #002060;
font-size: 11.0pt;
font-weight: 700;
font-style: normal;
text-decoration: none;
font-family: Calibri, sans-serif;
mso-font-charset: 0;
mso-number-format: General;
text-align: center;
vertical-align: middle;
border-top: 1.0pt solid windowtext;
border-right: 1.0pt solid windowtext;
border-bottom: none;
border-left: 1.0pt solid windowtext;
background: #DAEEF3;
mso-pattern: black none;
white-space: normal;
}
-->
</style>
</head>
<body>
<!--[if !excel]> <![endif]-->
<!--The following information was generated by Microsoft Excel's Publish as Web
Page wizard.-->
<!--If the same item is republished from Excel, all information between the DIV
tags will be replaced.-->
<!----------------------------->
<!--START OF OUTPUT FROM EXCEL PUBLISH AS WEB PAGE WIZARD -->
<!----------------------------->
<div id="tabel_transaksi_saham_copy_879" align=left x:publishsource="Excel">
<table border=0 cellpadding=0 cellspacing=0 width=615 style='border-collapse:
collapse;table-layout:fixed;width:462pt'>
<col width=166 style='mso-width-source:userset;mso-width-alt:6070;width:125pt'>
<col width=108 style='mso-width-source:userset;mso-width-alt:3949;width:81pt'>
<col width=103 style='mso-width-source:userset;mso-width-alt:3766;width:77pt'>
<col width=90 style='mso-width-source:userset;mso-width-alt:3291;width:68pt'>
<col width=84 style='mso-width-source:userset;mso-width-alt:3072;width:63pt'>
<col width=64 style='width:48pt'>
<tr height=21 style='height:15.75pt'>
<td height=21 class=xl63879 width=166 style='height:15.75pt;width:125pt'> </td>
<td colspan=2 class=xl67879 width=211 style='border-right:1.0pt solid black;
border-left:none;width:158pt'>11</td>
<td rowspan=2 class=xl69879 width=90 style='border-bottom:1.0pt solid black;
width:68pt'>11</td>
<td colspan=2 class=xl67879 width=148 style='border-right:1.0pt solid black;
border-left:none;width:158pt'>11</td>
</tr>
<tr height=21 style='height:15.75pt'>
<td height=21 class=xl64879 width=166 style='height:15.75pt;width:125pt'> </td>
<td class=xl65879 width=108 style='width:81pt'>11</td>
<td class=xl65879 width=103 style='width:77pt'>11</td>
<td class=xl65879 width=84 style='width:63pt'>11</td>
<td class=xl65879 width=64 style='width:48pt'>11</td>
</tr>
<tr height=21 style='height:15.75pt'>
<td height=21 class=xl66879 align=right width=166 style='height:15.75pt;
width:125pt'>11</td>
<td class=xl66879 align=right width=108 style='border-left:none;width:81pt'>11</td>
<td class=xl66879 align=right width=103 style='border-left:none;width:77pt'>11</td>
<td class=xl66879 align=right width=90 style='border-left:none;width:68pt'>11</td>
<td class=xl66879 align=right width=84 style='border-left:none;width:63pt'>11</td>
<td class=xl66879 align=right width=64 style='border-left:none;width:48pt'>11</td>
</tr>
<tr height=21 style='mso-height-source:userset;height:15.75pt'>
<td height=21 class=xl66879 align=right width=166 style='height:15.75pt;
width:125pt'>11</td>
<td class=xl66879 align=right width=108 style='border-left:none;width:81pt'>11</td>
<td class=xl66879 align=right width=103 style='border-left:none;width:77pt'>11</td>
<td class=xl66879 align=right width=90 style='border-left:none;width:68pt'>11</td>
<td class=xl66879 align=right width=84 style='border-left:none;width:63pt'>11</td>
<td class=xl66879 align=right width=64 style='border-left:none;width:48pt'>11</td>
</tr>
<tr height=21 style='height:15.75pt'>
<td height=21 class=xl66879 align=right width=166 style='height:15.75pt;
width:125pt'>11</td>
<td class=xl66879 align=right width=108 style='border-left:none;width:81pt'>11</td>
<td class=xl66879 align=right width=103 style='border-left:none;width:77pt'>11</td>
<td class=xl66879 align=right width=90 style='border-left:none;width:68pt'>11</td>
<td class=xl66879 align=right width=84 style='border-left:none;width:63pt'>11</td>
<td class=xl66879 align=right width=64 style='border-left:none;width:48pt'>11</td>
</tr>
<![if supportMisalignedColumns]>
<tr height=0 style='display:none'>
<td width=166 style='width:125pt'></td>
<td width=108 style='width:81pt'></td>
<td width=103 style='width:77pt'></td>
<td width=90 style='width:68pt'></td>
<td width=84 style='width:63pt'></td>
<td width=64 style='width:48pt'></td>
</tr>
<![endif]>
</table>
</div>
<!----------------------------->
<!--END OF OUTPUT FROM EXCEL PUBLISH AS WEB PAGE WIZARD-->
<!----------------------------->
</body>
</html>

Asp.net Table width not going above 100%

I tried looking for similar questions but couldn't find any. My aspx file looks something like this
<div align="center" style="height: 350px; overflow-y: scroll; overflow-x: scroll; width: 100%;">
<asp:Table ID="tblReport" Font-Size="11px" runat="server" ViewStateMode="Enabled">
<asp:TableHeaderRow Height="30px" ForeColor="#FFFFFF" BackColor="#3b3b3b" Style="font-weight: bold; text-align: left !important; padding-left: 3px; color: #FFF; border-right: 1px solid #ddeaf7;" HorizontalAlign="Center">
</asp:TableHeaderRow>
</asp:Table>
</div>
Now whenever there are too many columns, the width gets fixed to 100% even though I don't have a max-width property. And the text in the cell ends up in multiple lines. I want it to horizontally overflow out of the div instead.
Edit: Looks like adding this ended up fixing it.
td {
white-space: nowrap;
}
Simple html snippet to demonstrate -
div {
width: 10%;
overflow: auto;
}
<div>
<table>
<tr>
<th>A</th>
<th>B</th>
<th>C</th>
<th>D</th>
<th>E</th>
<th>F</th>
<th>G</th>
<th>H</th>
<th>I</th>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>1</td>
<td>2</td>
<td>3</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>6</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>4</td>
<td>5</td>
<td>6</td>
</tr>
</table>
</div>
EDIT -
Your updated code would look like -
<div align="center" style="height: 350px; overflow: auto; width: 100%;">
<asp:Table ID="tblReport" Font-Size="11px" runat="server" ViewStateMode="Enabled">
<asp:TableHeaderRow Height="30px" ForeColor="#FFFFFF" BackColor="#3b3b3b" Style="font-weight: bold; text-align: left !important; padding-left: 3px; color: #FFF; border-right: 1px solid #ddeaf7;" HorizontalAlign="Center">
</asp:TableHeaderRow>
</asp:Table>
</div>

HTMLAgility pack C# unclosed colgroup tag

I have a string (HTML) being posted to server side and then it is validated using HTMLAgility pack. In the HTML there is an unclosed colgroup tag.
After sanitizing, the closing colgroup tag appears but right between closing "tbody" and "table" tag
BEFORE:
<table width="3265" class="mce-item-table" style="width: 2452pt; border-collapse: collapse;" border="0" cellspacing="0" cellpadding="0">
<colgroup><col width="80" style="width: 60pt;">
<col width="245" style="width: 184pt;" span="13"> <!-- MISSING COLGROUP tag-->
<tbody><tr height="20" style="height: 15pt;">
<td width="80" height="20" style="width: 60pt; height: 15pt; color: blue; text-decoration: underline; text-underline-style: single;"><span style="color: blue;">31109173</span></td>
<td width="245" style="width: 184pt; font-family: Arial; font-size: 9pt;">31109173</td>
<td width="245" align="right" style="width: 184pt; font-family: Arial; font-size: 9pt;">May 09,2017 9:54 AM</td>
<td width="245" align="right" style="width: 184pt; font-family: Arial; font-size: 9pt;">May 08,2017 5:21 PM</td>
</tr>
<tr height="20" style="height: 15pt;">
<td height="20" style="height: 15pt; color: blue; text-decoration: underline; text-underline-style: single;"><span style="color: blue;">30933775</span></td>
<td style="font-family: Arial; font-size: 9pt;">30933775</td>
<td align="right" style="font-family: Arial; font-size: 9pt;">May 09,2017 9:50 AM</td>
<td align="right" style="font-family: Arial; font-size: 9pt;">Apr 28,2017 6:22 PM</td>
</tr>
</tbody></table>
AFTER:
<table width="3265" class="mce-item-table" style="width: 2452pt; border-collapse: collapse;" border="0" cellspacing="0" cellpadding="0">
<colgroup><col width="80" style="width: 60pt;">
<col width="245" style="width: 184pt;" span="13">
<tbody><tr height="20" style="height: 15pt;">
<td width="80" height="20" style="width: 60pt; height: 15pt; color: blue; text-decoration: underline; text-underline-style: single;"><span style="color: blue;">31109173</span></td>
<td width="245" style="width: 184pt; font-family: Arial; font-size: 9pt;">31109173</td>
<td width="245" align="right" style="width: 184pt; font-family: Arial; font-size: 9pt;">May 09,2017 9:54 AM</td>
<td width="245" align="right" style="width: 184pt; font-family: Arial; font-size: 9pt;">May 08,2017 5:21 PM</td>
</tr>
<tr height="20" style="height: 15pt;">
<td height="20" style="height: 15pt; color: blue; text-decoration: underline; text-underline-style: single;"><span style="color: blue;">30933775</span></td>
<td style="font-family: Arial; font-size: 9pt;">30933775</td>
<td align="right" style="font-family: Arial; font-size: 9pt;">May 09,2017 9:50 AM</td>
<td align="right" style="font-family: Arial; font-size: 9pt;">Apr 28,2017 6:22 PM</td>
</tr>
</tbody></colgroup></table>
<!-- ^^ </colgroup> has appeared above-->
I tried setting "OptionFixNestedTags" flag to true. I still get the same result.
I tried various options from HTMLAgility pack and setting them true. This didn't work.
OptionFixNestedTags = true;
OptionAutoCloseOnEnd = true;
There is a nice Nuget package which sanitizes the html. The problem which I faced was tackled here -> HtmlSanitizer
Hope this helps.

How to save necessary information of HTML file to string variable

A HTML file is generated by Android MobileBiz Pro invoice app. I'm trying to make software for print HTML based invoice via receipt printer
I need to save necessary information of HTML file to string variable as mentioned below. I tried using IndexOf method. but it's not working for me. How can I get this information using visual c#?
string subtotal = 2,976.00;
string total = 2,976.00;
string payment= 2,760.00;
string balance= 216.00;
This is an example of the HTML code:
<tr><td align="right" colspan="3">Subtotal</td><td align="right">2,976.00</td></tr><tr><td align="right" colspan="3"><b>TOTAL</b></td><td align="right"><b>2,976.00</b></td></tr><tr><td align="right" colspan="3">Less Payment</td><td align="right">2,760.00</td></tr><tr class="total"><td align="right" colspan="3"><strong>Balance Due</strong></td><td align="right">216.00</td></tr>
This is a complete HTML code of HTML file
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<style>
body {
font-family:Verdana, Geneva, sans-serif;
font-size: 8pt;
padding: 0 50pt 0 50pt;
}
table td, table th, table.sales th, table td.footer-text {
font-size: 8pt;
}
h1 {
font-family:Verdana, Geneva, sans-serif;
padding-bottom:2px;
margin-bottom:2px;
color:chocolate;
text-transform:uppercase;
font-size: inherit;
font-size: 1.5em;
}
h2 {
font-family:Verdana, Geneva, sans-serif;
padding-bottom:0px;
margin-bottom:0px;
color:chocolate;
text-transform:uppercase;
font-size: 1.3em;
}
h3 {
font-family:Verdana, Geneva, sans-serif;
padding-bottom:2px;
margin-bottom:2px;
}
table.sales td {
padding: 4px 10px 4px 10px;
}
table.sales th {
padding: 5px 10px 5px 10px;
background-color:#CCC;
}
tr.saleline td {
border-bottom-color:chocolate;
border-bottom-width: 1pt;
border-bottom-style: solid;
vertical-align: top;
}
.signature {
display: none;
}
.horizontal-line {
border: 0;
height: 4pt;
color:chocolate;
background-color: chocolate;
}
.total {
font-weight:bold;
font-size:1.1em;
background-color:#CCC;
}
.block1 {
text-align:left;
vertical-align:bottom
}
.block2 {
text-align:right;
vertical-align:bottom
}
.block3 {
text-align:left;
vertical-align:top;
}
.block4 {
text-align:left;
vertical-align:top;
}
.block5 {
text-align:right;
vertical-align:bottom;
}
.block6 {
text-align:left;
vertical-align:top;
margin-top: 15px;
}
.block7 {
text-align:left;
vertical-align:top;
margin-top: 15px;
}
.block8 {
text-align:left;
vertical-align:bottom;
}
.block9 {
text-align:center;
vertical-align:bottom;
}
.block10 {
text-align:right;
vertical-align:bottom;
}
.block11 {
text-align:left;
padding: 25px 0 15px 0;
}
.extracols {
border-style:solid;
border-color:gray;
}
table.extracols {
border-top-width: 1pt;
border-right-width: 0;
border-bottom-width: 1pt;
border-left-width: 1pt;
border-collapse:collapse;
margin: 0 0 15pt 0;
}
table.extracols th {
padding: 5px 10px 5px 10px;
border-top-width: 0;
border-right-width: 1pt;
border-bottom-width: 0;
border-left-width: 0;
border-color:gray;
border-style:solid;
background-color:#CCC;
}
table.extracols td {
padding: 4px 10px 4px 10px;
border-top-width: 0;
border-right-width: 1pt;
border-bottom-width: 0;
border-left-width: 0;
border-color:gray;
border-style:solid;
background-color:#FFF;
}
#footer {
margin-top: 35px;
}
.footer-text {
font-size: inherit;
font-size: 0.97em
}
</style>
</head>
<body style="padding: 20 20 20 20">
<table width="100%">
<tr>
<td style="padding-bottom:20px"><table width="100%">
<tr>
<td style="text-align:left;"></td>
<td class="block2" align="right"><h3>Y.P.Brothers</h3>
No:55/B,<br/>Samagipura,<br/>Sewanagala.
<br/>077-6977139
<br/>mecduino#gmail.com
<br/>
</td>
</tr>
</table></td>
</tr>
<tr>
<td><hr class="horizontal-line"/></td>
</tr>
<tr>
<td><table width="100%">
<tr>
<td style="padding:10px 0 20px 0;"><table width="100%">
<tr>
<td width="33%" class="block3"><strong>Bill To</strong><br />
ANUSHA SURIYA<br/>
</td>
<td class="block4"><strong></strong><br />
</td>
<td class="block5" align="right"><h1>invoice #1</h1>
<b>Date</b>: Oct 9, 2015
<br/><b>Due Date</b>: Oct 9, 2015
</td>
</tr>
</table></td>
</tr>
<tr>
<td>
</td>
</tr>
<tr>
<td><table width="100%" class="sales">
<!-- Headers -->
<tr>
<th align="center">Qty</th> <th align="center">Item</th> <th align="right">Price</th> <th align="right">Amount</th>
</tr>
<!-- Rows -->
<tr class="saleline"> <td align="left">12</td> <td align="left">helaligth 35/=</td> <td align="right">35.00</td> <td align="right">420.00</td> </tr>
<tr class="saleline"> <td align="left">12</td> <td align="left">200p CR SR 195/=</td> <td align="right">195.00</td> <td align="right">2,340.00</td> </tr>
<tr class="saleline"> <td align="left">36</td> <td align="left">Sunlight 35g</td> <td align="right">6.00</td> <td align="right">216.00</td> </tr>
<!-- Totals -->
<tr><td align="right" colspan="3">Subtotal</td><td align="right">2,976.00</td></tr><tr><td align="right" colspan="3"><b>TOTAL</b></td><td align="right"><b>2,976.00</b></td></tr><tr><td align="right" colspan="3">Less Payment</td><td align="right">2,760.00</td></tr><tr class="total"><td align="right" colspan="3"><strong>Balance Due</strong></td><td align="right">216.00</td></tr>
</table></td>
</tr>
</table></td>
</tr>
<tr>
<td><table width="100%" style="margin-top:30px">
<tr>
<td width="50%" class="block6"><h2></h2>
</td>
<td width="50%" class="block7" align="right"><h2></h2>
</td>
</tr>
</table></td>
</tr>
<tr>
<td><table class="block11" width="100%">
<tr>
<td><table></table></td>
</tr>
</table></td>
</tr>
<tr>
<td></td>
</tr>
</table>
<div class="signature">
<table border="0" cellspacing="2" cellpadding="2" align="left">
<tr>
<td style="padding-bottom:30px"></td>
</tr>
<tr>
<td><b>Signed by:</b>
<br/><b>Date:</b>
<br/><b>Signature:</b><br/></td>
</tr>
</table>
</div>
<div id="footer">
<table width="100%" border="0" cellpadding="2">
<tr>
<td align="center"><span class="footer-text">Thank you for your business.</span></td>
</tr>
</table>
</div>
</body>
</html>
You need an html parser, try this one http://htmlagilitypack.codeplex.com/
Load page into HtmlDocument
HtmlWeb htmlWeb = new HtmlWeb();
HtmlDocument htmlDocument = htmlWeb.Load("url");
Get table with specified Id
HtmlNode table = htmlDocument.DocumentNode.Descendants("table").SingleOrDefault(x => x.Id == "tableId");
Loop through nodes to find values
foreach(HtmlNode child in table.ChildNodes)
{
if (child.NodeType != HtmlNodeType.Text)
{
Console.WriteLine(child.Name);
}
}
More you can check here http://www.codeproject.com/Articles/691119/Html-Agility-Pack-Massive-information-extraction-f
You have to use parseHTML function of jquery and then loop through each element to get the values. Below is the working example (It can be more refined as per your need)
$(document).ready(function () {
var str = '<tr><td align="right" colspan="3">Subtotal</td><td align="right">2,976.00</td></tr><tr><td align="right" colspan="3"><b>TOTAL</b></td><td align="right"><b>2,976.00</b></td></tr><tr><td align="right" colspan="3">Less Payment</td><td align="right">2,760.00</td></tr><tr class="total"><td align="right" colspan="3"><strong>Balance Due</strong></td><td align="right">216.00</td></tr>';
var html = $.parseHTML(str);
$.each(html, function (index, element) {
if ($(this).find("td:first").html() == "Subtotal")
console.log($(this).find("td:last").html());
else if ($(this).find("td:first b").html() == "TOTAL")
console.log($(this).find("td:last b").html());
else if ($(this).find("td:first").html() == "Less Payment")
console.log($(this).find("td:last").html());
else if ($(this).find("td:first strong").html() == "Balance Due")
console.log($(this).find("td:last").html());
});
});

CSS not displaying the same across different broswers

This is what my page should look like, this is displayed in IE:
This is what it looks like in Firefox:
This is my code:
#using CustPortal.serviceclass
#model CustomerData
<br/>
<div class="leftdiv">
<fieldset>
<legend>Customer Info</legend>
#Html.Partial("CustomerInfo", Model)
</fieldset>
</div>
<div class="rightdiv">
<fieldset>
<legend>Balance</legend>
<div>
#Html.Partial("AccountBalance", Model)
</div>
</fieldset>
</div>
<div>
<table style="width: 100%" id="ThinLineTable">
<tr>
<th class="ThinLineTdLeft ThinLineTh" style="width: 15%">Date</th>
<th class="ThinLineTdLeft ThinLineTh" style="width: 15%">Refer#</th>
<th class="ThinLineTdLeft ThinLineTh" style="width: 30%">Description</th>
<th class="ThinLineTdRight ThinLineTh" style="width: 10%">Qty</th>
<th class="ThinLineTdRight ThinLineTh" style="width: 15%">Total</th>
<th class="ThinLineTdRight ThinLineTh" style="width: 15%">Balance</th>
</tr>
#foreach (var item in (IEnumerable<TransactionHistory>) ViewBag.TransactionHistory)
{
<tr>
<td class="ThinLineTdLeft">#item.TransactionDate</td>
<td class="ThinLineTdLeft">#item.ReferenceNumber</td>
<td class="ThinLineTdLeft">#item.Description</td>
<td class="ThinLineTdRight">#item.Quantity</td>
<td class="ThinLineTdRight">#item.Total</td>
<td class="ThinLineTdRight">#item.Balance</td>
</tr>
}
</table>
</div>
And my css:
.ThinLineTdRight {
padding: 5px;
border: solid 1px #d4d0d0;
text-align: right;
}
.ThinLineTdLeft {
padding: 5px;
border: solid 1px #d4d0d0;
text-align: left;
}
.ThinLineTh {
background-color: #e8eef4;
}
.rightdiv {
float: right;
width: 49%;
text-align: left;
}
.leftdiv {
float: left;
width: 49%;
text-align: left;
}
I'm not great with CSS, I'm still learning. Can anyone tell me what I am doing wrong to make it display differently in FF than in IE?
Thank you!

Categories