Extract region and replace them back for a template - c#

I have a template where I want to replace certain regions. In my example below, I want to extract the regions between the ... comments, manipulate it, then replace them back after the manipulation.
I do not need the logic to merge the fields, but I need to extract the regions so I can use my logic and place it back into the template.
Does anyone know of an elegant or simple way to extract these regions? I am also hoping to extract the url values in the process as well if it is easy to do along the way.
<table width="700" border="0" align="center" cellpadding="4" cellspacing="0">
<tr>
<td align="center" valign="top">
<!--DynamicSlotStart url="http://www.test.com/itemdisplay0_10751_-1_57436_10001"-->
<table>
<tbody>
<tr>
<td><p><a title="[element='title']" href="[url]"><img border="0" alt="[element='title']" src="[element='photo' property='src' maxwidth='135']" width="135" height="135" /></a></p></td>
</tr>
<tr>
<td><span>[element='h1']</span></td>
</tr>
<tr>
<td><span><strong>[element='price']<br />
</strong></span><span>[element='was_price']</span></td>
</tr>
<tr>
<td><span><a title="[element='title']" href="[url]">Details</a></span></td>
</tr>
</tbody>
</table>
<!--DynamicSlotFinish-->
</td>
<td align="center" valign="top">
<!--DynamicSlotStart url="http://www.test.com/itemdisplay0_10751_-1_3379_10001"-->
<table>
<tbody>
<tr>
<td><p><a title="[element='title']" href="[url]"><img border="0" alt="[element='title']" src="[element='photo' property='src' maxwidth='135']" width="135" height="135" /></a></p></td>
</tr>
<tr>
<td><span>[element='h1']</span></td>
</tr>
<tr>
<td><span><strong>[element='price']<br />
</strong></span><span>[element='was_price']</span></td>
</tr>
<tr>
<td><span><a title="[element='title']" href="[url]">Details</a></span></td>
</tr>
</tbody>
</table>
<!--DynamicSlotFinish-->
</td>
<td align="center" valign="top">
<!--DynamicSlotStart url="http://www.test.com/itemdisplay0_10751_-1_104854_10001"-->
<table>
<tbody>
<tr>
<td><p><a title="[element='title']" href="[url]"><img border="0" alt="[element='title']" src="[element='photo' property='src' maxwidth='135']" width="135" height="135" /></a></p></td>
</tr>
<tr>
<td><span>[element='h1']</span></td>
</tr>
<tr>
<td><span><strong>[element='price']<br />
</strong></span><span>[element='was_price']</span></td>
</tr>
<tr>
<td><span><a title="[element='title']" href="[url]">Details</a></span></td>
</tr>
</tbody>
</table>
<!--DynamicSlotFinish-->
</td>
<td align="center" valign="top">
<!--DynamicSlotStart url="http://www.test.com/itemdisplay0_10751_-1_80977_10001"-->
<table>
<tbody>
<tr>
<td><p><a title="[element='title']" href="[url]"><img border="0" alt="[element='title']" src="[element='photo' property='src' maxwidth='135']" width="135" height="135" /></a></p></td>
</tr>
<tr>
<td><span>[element='h1']</span></td>
</tr>
<tr>
<td><span><strong>[element='price']<br />
</strong></span><span>[element='was_price']</span></td>
</tr>
<tr>
<td><span><a title="[element='title']" href="[url]">Details</a></span></td>
</tr>
</tbody>
</table>
<!--DynamicSlotFinish-->
</td>
</tr>
</table>

Maybe this project will be helpful: Html Agility Pack
What is exactly the Html Agility Pack (HAP)?
This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).
Html Agility Pack now supports Linq to Objects (via a LINQ to Xml Like interface). Check out the new beta to play with this feature
Sample applications:
Page fixing or generation. You can
fix a page the way you want, modify
the DOM, add nodes, copy nodes,
well... you name it.
Web scanners.
You can easily get to img/src or
a/hrefs with a bunch XPATH queries.
Web scrapers. You can easily scrap
any existing web page into an RSS
feed for example, with just an XSLT
file serving as the binding. An
example of this is provided.

Related

Create css applied pdf using c# [duplicate]

I am trying to convert Html to PDF. I am using iTextSharp. I found that iTextSharp does not support CSS well. Infact I think HtmlWorker thread does not support it all. To compound my problem iTextSharp does not seem to support RowSpan either.
This is what I am trying to generate: http://jsbin.com/jovugohuju/1/edit?html,output
<table border="1" width="700">
<tr>
<td colspan="5" align="center" bgcolor="lightblue">INVOICE</td>
</tr>
<tr>
<td colspan="2" rowspan="4" bgcolor="white"><b>AIRNET NETWORKS</b>
<br>asdadadadaada asd asd a ads adsadsadsadasd</td>
<td>INVOICE</td>
<td>DATE</td>
<td>aDATEsd</td>
</tr>
<tr>
<td>Order</td>
<td>XXXX</td>
<td>Ref XXXXXX</td>
</tr>
<tr>
<td>Delivery</td>
<td>XXXX</td>
<td>Ref XXXXXX</td>
</tr>
<tr>
<td>Due Date</td>
<td>XXXX</td>
<td>Ref XXXXXX</td>
</tr>
<tr>
<td colspan="2" rowspan="4" bgcolor="white">
<p><b>CUSTOMER NAME</b>
</p>asd asd adadaadadadada adadaadsasdad ada asd adad</td>
</tr>
<tr>
<td>Customer Care No:</td>
<td colspan="2">544646454,88877978975</td>
</tr>
<tr>
<td>Email Id</td>
<td colspan="2">airnet#gmail.com</td>
</tr>
<tr>
<td>Account Details</td>
<td colspan="2">5522245125545455 IFSC 323hasd</br>SBI India</td>
</tr>
</table>
<table border="1" width="700">
<tr>
<td bgcolor="lightblue" height="15">Srno</td>
<td bgcolor="lightblue">Particulars</td>
<td bgcolor="lightblue">Quantity</td>
<td bgcolor="lightblue">Rate/Month</td>
<td bgcolor="lightblue">Total Rupees</td>
<tr>
<td valign="top">1</td>
<td valign="top">1 MBPS Plan</td>
<td valign="top">1</td>
<td valign="top">600</td>
<td valign="top">692</td>
</tr>
</tr>
<tr>
<td height="300" valign="top">1</td>
<td valign="top">1 MBPS Plan</td>
<td valign="top">1</td>
<td valign="top">600</td>
<td valign="top">692</td>
</tr>
<tr>
<td colspan="3" rowspan="3" valign="top">asdasdasd</td>
<td colspan="1">Total</td>
<td colspan="1">692</td>
</tr>
<tr>
<td>Service Tax</td>
<td>692</td>
</tr>
<tr>
<td>Grand Total</td>
<td>692</td>
</tr>
</table>
C# CODE:
Document document = new Document();
document.SetPageSize(iTextSharp.text.PageSize.A4);
iTextSharp.text.pdf.draw.LineSeparator line1 = new iTextSharp.text.pdf.draw.LineSeparator(0f, 100f, iTextSharp.text.Color.BLACK, Element.ALIGN_LEFT, 1);
string NEWhtmlText="<table border='1' width='500' > <tr> <td bgcolor='lightblue' height='15' >Srno</td><td bgcolor='lightblue'>Particulars</td><td bgcolor='lightblue' >Quantity</td><td bgcolor='lightblue'>Rate/Month</td><td bgcolor='lightblue'>Total Rupees</td> </tr> <tr> <td valign='top' >1</td><td valign='top' >1 MBPS Plan</td><td valign='top'>1</td><td valign='top'>600</td><td valign='top'>692</td> </tr> <tr> <td height='300' valign='top' >1</td><td valign='top' >1 MBPS Plan</td><td valign='top'>1</td><td valign='top'>600</td><td valign='top'>692</td> </tr> <tr><td colspan='3' rowspan='3' valign='top'>asdasdasd</td><td colspan='1'>Total</td><td colspan='1'>692</td></tr> <tr><td>Service Tax</td><td>692</td></tr> <tr><td>Grand Total</td><td>692</td></tr> </table>";
PdfWriter.GetInstance(document, new FileStream(saveFileDialog1.FileName, FileMode.Create));
document.Open();
iTextSharp.text.html.simpleparser.HTMLWorker hw = new iTextSharp.text.html.simpleparser.HTMLWorker(document);
hw.Parse(new StringReader(NEWhtmlText));
document.Close();
OUTPUT(unwanted):
Please take a look at the following screen shot:
To the left, you see an HTML file rendered in a browser. To the right, you see that HTML file rendered to PDF using iText (the Java version). Note that the functionality of iTextSharp regarding HTML to PDF is identical to Java, hence you shouldn't post questions saying "does not work in iTextSharp" because that sounds as if iTextSharp can't achieve what you want to do (which is an incorrect allegation), whereas the actual problem is caused by some individual errors you made when writing your code. It is not friendly to blame a tool for your own errors ;-)
There are three reasons why your application doesn't work:
Your HTML doesn't make sense. I had to clean it up (change <br> into <br />, introduce the correct CSS, correct the column-count for some rows,...) and make it XHTML before it rendered correctly in a browser. You can find the HTML that was used in the screenshot here: table2_css.html
You are using HTMLWorker instead of XML Worker, and you are right: HTMLWorker has no support for CSS. Saying CSS doesn't work in iTextSharp is wrong. It doesn't work when you use HTMLWorker, but that's documented: the CSS you need works in XML Worker.
You are probably using an old version of iTextSharp, and you are right: CSS and table support wasn't as good as in older versions of iTextSharp when compared to the most recent version.
See the XML Worker page on the official iText site for more info. Apart from iTextSharp, you also need to download XML Worker. The examples are written in Java, but you should have no problem converting them to C#. The example I used to make the PDF in the screen shot (html_table_4.pdf) can be found here: ParseHtmlTable4
public void createPdf(String file) throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream(HTML));
// step 5
document.close();
}

How to set accessibility attributes within .NET 's MenuItems, _without_ JavaScript

I am working on a sidenav that is built on .NET MenuItems like so:
<asp:MenuItem value="19" Text="Profile" Selectable="false"></asp:MenuItem>
<asp:MenuItem value="0" Text="Overview" ToolTip="Overview" Selected="true"></asp:MenuItem>
<asp:MenuItem value="2" Text="My Info & Email Subscriptions" ToolTip="My Info & Email Subscriptions"></asp:MenuItem>
In HTML, the output produces a series of nested tables around each MenuItem which looks like this:
<div id="_links" class="span-3">
<table id="FormUserControl__tabMenu" cellpadding="0" cellspacing="0" border="0" style="clear:left;">
<tbody>
<tr id="FormUserControl__tabMenun0">
<td>
<table cellpadding="0" cellspacing="0" border="0" width="100%">
<tbody>
<tr>
<td style="white-space:nowrap;width:100%;">
<a style="text-decoration:none;">
<div id="FormUserControl__tabMenu_ctl00__tabMenuItemPanel" class="myAccountHeading ">
Profile
</div>
</a>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr onmouseover="Menu_HoverStatic(this)" onmouseout="Menu_Unhover(this)" onkeyup="Menu_Key(this)" title="Overview" id="FormUserControl__tabMenun1">
<td>
<table cellpadding="0" cellspacing="0" border="0" width="100%">
<tbody>
<tr>
<td style="white-space:nowrap;width:100%;"><a href="javascript:__doPostBack('FormUserControl$_tabMenu','0')" style="text-decoration:none;">
<div id="FormUserControl__tabMenu_ctl01__tabMenuItemPanel" class="sideNav">
Overview
</div>
</td>
<tr>
<tbody>
</td>
</tr>
</tbody>
</table>
</div>
How can I add to add accessibility role and aria-level to these innermost divs? The goal is to achieve accessibility compliance. For example:
<div role="heading" aria-level="[2]">Profile</div>
I have looked through MSDN documentation and it looks like there isn't a way to add those attributes within the intial MenuItem declaration.
I also tried adding role and aria-level attributes within CSS, which I know is hacky, but I figured since content can be set, it was worth trying. That doesn't work.
I could readily do this in JavaScript, but I really want to avoid involving that, it's a last resort and I do know how to do that.
Is there a way to change the MenuItem output to involve role and aria-level? Or, is there a way to have it output a header instead of a div nested within two tables?
Many thanks!

RowSpan does not work in iTextSharp?

I am trying to convert Html to PDF. I am using iTextSharp. I found that iTextSharp does not support CSS well. Infact I think HtmlWorker thread does not support it all. To compound my problem iTextSharp does not seem to support RowSpan either.
This is what I am trying to generate: http://jsbin.com/jovugohuju/1/edit?html,output
<table border="1" width="700">
<tr>
<td colspan="5" align="center" bgcolor="lightblue">INVOICE</td>
</tr>
<tr>
<td colspan="2" rowspan="4" bgcolor="white"><b>AIRNET NETWORKS</b>
<br>asdadadadaada asd asd a ads adsadsadsadasd</td>
<td>INVOICE</td>
<td>DATE</td>
<td>aDATEsd</td>
</tr>
<tr>
<td>Order</td>
<td>XXXX</td>
<td>Ref XXXXXX</td>
</tr>
<tr>
<td>Delivery</td>
<td>XXXX</td>
<td>Ref XXXXXX</td>
</tr>
<tr>
<td>Due Date</td>
<td>XXXX</td>
<td>Ref XXXXXX</td>
</tr>
<tr>
<td colspan="2" rowspan="4" bgcolor="white">
<p><b>CUSTOMER NAME</b>
</p>asd asd adadaadadadada adadaadsasdad ada asd adad</td>
</tr>
<tr>
<td>Customer Care No:</td>
<td colspan="2">544646454,88877978975</td>
</tr>
<tr>
<td>Email Id</td>
<td colspan="2">airnet#gmail.com</td>
</tr>
<tr>
<td>Account Details</td>
<td colspan="2">5522245125545455 IFSC 323hasd</br>SBI India</td>
</tr>
</table>
<table border="1" width="700">
<tr>
<td bgcolor="lightblue" height="15">Srno</td>
<td bgcolor="lightblue">Particulars</td>
<td bgcolor="lightblue">Quantity</td>
<td bgcolor="lightblue">Rate/Month</td>
<td bgcolor="lightblue">Total Rupees</td>
<tr>
<td valign="top">1</td>
<td valign="top">1 MBPS Plan</td>
<td valign="top">1</td>
<td valign="top">600</td>
<td valign="top">692</td>
</tr>
</tr>
<tr>
<td height="300" valign="top">1</td>
<td valign="top">1 MBPS Plan</td>
<td valign="top">1</td>
<td valign="top">600</td>
<td valign="top">692</td>
</tr>
<tr>
<td colspan="3" rowspan="3" valign="top">asdasdasd</td>
<td colspan="1">Total</td>
<td colspan="1">692</td>
</tr>
<tr>
<td>Service Tax</td>
<td>692</td>
</tr>
<tr>
<td>Grand Total</td>
<td>692</td>
</tr>
</table>
C# CODE:
Document document = new Document();
document.SetPageSize(iTextSharp.text.PageSize.A4);
iTextSharp.text.pdf.draw.LineSeparator line1 = new iTextSharp.text.pdf.draw.LineSeparator(0f, 100f, iTextSharp.text.Color.BLACK, Element.ALIGN_LEFT, 1);
string NEWhtmlText="<table border='1' width='500' > <tr> <td bgcolor='lightblue' height='15' >Srno</td><td bgcolor='lightblue'>Particulars</td><td bgcolor='lightblue' >Quantity</td><td bgcolor='lightblue'>Rate/Month</td><td bgcolor='lightblue'>Total Rupees</td> </tr> <tr> <td valign='top' >1</td><td valign='top' >1 MBPS Plan</td><td valign='top'>1</td><td valign='top'>600</td><td valign='top'>692</td> </tr> <tr> <td height='300' valign='top' >1</td><td valign='top' >1 MBPS Plan</td><td valign='top'>1</td><td valign='top'>600</td><td valign='top'>692</td> </tr> <tr><td colspan='3' rowspan='3' valign='top'>asdasdasd</td><td colspan='1'>Total</td><td colspan='1'>692</td></tr> <tr><td>Service Tax</td><td>692</td></tr> <tr><td>Grand Total</td><td>692</td></tr> </table>";
PdfWriter.GetInstance(document, new FileStream(saveFileDialog1.FileName, FileMode.Create));
document.Open();
iTextSharp.text.html.simpleparser.HTMLWorker hw = new iTextSharp.text.html.simpleparser.HTMLWorker(document);
hw.Parse(new StringReader(NEWhtmlText));
document.Close();
OUTPUT(unwanted):
Please take a look at the following screen shot:
To the left, you see an HTML file rendered in a browser. To the right, you see that HTML file rendered to PDF using iText (the Java version). Note that the functionality of iTextSharp regarding HTML to PDF is identical to Java, hence you shouldn't post questions saying "does not work in iTextSharp" because that sounds as if iTextSharp can't achieve what you want to do (which is an incorrect allegation), whereas the actual problem is caused by some individual errors you made when writing your code. It is not friendly to blame a tool for your own errors ;-)
There are three reasons why your application doesn't work:
Your HTML doesn't make sense. I had to clean it up (change <br> into <br />, introduce the correct CSS, correct the column-count for some rows,...) and make it XHTML before it rendered correctly in a browser. You can find the HTML that was used in the screenshot here: table2_css.html
You are using HTMLWorker instead of XML Worker, and you are right: HTMLWorker has no support for CSS. Saying CSS doesn't work in iTextSharp is wrong. It doesn't work when you use HTMLWorker, but that's documented: the CSS you need works in XML Worker.
You are probably using an old version of iTextSharp, and you are right: CSS and table support wasn't as good as in older versions of iTextSharp when compared to the most recent version.
See the XML Worker page on the official iText site for more info. Apart from iTextSharp, you also need to download XML Worker. The examples are written in Java, but you should have no problem converting them to C#. The example I used to make the PDF in the screen shot (html_table_4.pdf) can be found here: ParseHtmlTable4
public void createPdf(String file) throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream(HTML));
// step 5
document.close();
}

How to create Regexp for html page

I have html page with this code
<table class="data">
<tr>
<td class="head" >Time</td>
<td class="head right" >Pref</td>
<td class="head" >Name</td>
<td class="head">Descr</td>
</tr>
<tr>
<td colspan="4" class="date">
2014.03.17
</td>
</tr>
<tr valign="top" class="dat">
<td>22:02</td>
<td class="right">
3/2014
</td>
<td>
<a href="/reports/id=34">
<b>Company Name</b>
</a>
</td>
<td>
<a href=/reports/view/id=34" target="_blank" class="th">
Description
</a>
</td>
</tr>
<tr valign="top" class="date">
<td>21:16</td>
<td class="right">
8/2014
</td>
<td>
<a href="/reports/id=324">
<b>Company Name2</b>
</a>
</td>
<td>
<a href="reports/view/=324" target="_blank" class="th">
Description
</a>
</td>
</tr>
................................
</table>
Can you help me create regexp to extract data from table. I need this data 21:16,8/2014,Company Name2,Description
Thanks.
Do NOT try parsing HTML with Regex. You might get fairly far, but it's very easy to screw it up and it doesn't work well. It learned this the hard way once. Like others have mentioned in the comments. See:
https://stackoverflow.com/a/1732454/794380
You should try the Html Agility pack: http://htmlagilitypack.codeplex.com
Take a look here https://stackoverflow.com/a/19871589/307976

how to use css align control one row

I use visual studio 2010 with framework2.0
I design a page use c# ,now ,just called A.aspx.
the page have a table to layout,
like
<table> <tr>
<td>
Name
</td>
<td>
<asp:TextBox .../>
<div id="div1" style="vertical-align: bottom;">
<img.. />
</div>
<div><img ../></div>
<asp:Label .../>
<asp:Label .../>
</td>
</tr>
</table>
Now ,I want to let second<td></td> layout one row. how to set up?
I try <td style ="float:left"> but not worked,it always show two row.
can somebody help me with this?
First understand your layout, here is an example with borders
If you want to float elements try a layout using div and css
A layout with tables is very hard to work with and nobody is doing it anymore. You can use tables for specific situation only if you like but most of the time try to use div layout and the other elements to organize content.
HTML
<table class="borde">
<tr class="borde">
<td class="borde">
Name
</td>
<td class="borde">
fffff
<div class="divRojo"></div>
fffff
</td>
</tr>
</table>
CSS
.borde { border:1px solid black; }
.divRojo { border:1px solid red; }
With one more row and the div
<table class="borde">
<tr class="borde">
<td class="borde">
Name
</td>
<td class="borde">
fffff
<div class="divRojo"></div>
fffff
</td>
</tr>
<tr>
<td colspan="2">
<div>Content in one column in a new row</div>
</td>
</tr>
</table>
You want your second <td> in one row? So, create another <tr>?
<table>
<tbody>
<tr>
<td>
Name
</td>
</tr>
<tr>
<td>
<textarea></textarea>
</td>
</tr>
</tbody>
</table>
http://jsbin.com/uJenUVOx/1/edit
I have fix this question ,the second <td> use <table> then can solve that question.

Categories