when I generate a report using asp.net with c# into xls format it is initially approximately 32 MB. But when i open it using excel and save it as xls format providing different name, the size is reduced to approximately 10 KB, but the data and the formatting isnt lost both the file appear identical in Excel.
why is that so?
How could i initially generate a report of smaller file size?
32MB File Size contains data like
<tr height="20" style='height:15.00pt;'>
<td class="xl66" height="20" style='height:15.00pt;' x:str>PR</td>
<td class="xl72" x:str>00923</td>
<td class="xl66" align="right" x:num>2016</td>
<td class="xl66" align="right" x:num>25</td>
<td class="xl66" align="right" x:num>89</td>
<td class="xl66" align="right" x:num>89</td>
<td class="xl66" align="right" x:num>45</td>
<td class="xl66" align="right" x:num>52</td>
<td class="xl66" align="right" x:num>2316</td>
<td class="xl73" align="right" x:num="0.87">87%</td>
<td class="xl73" align="right" x:num="1.e-002">1%</td>
<td class="xl73" align="right" x:num="4.0000000000000001e-002">4%</td>
<td class="xl73" align="right" x:num="4.0000000000000001e-002">4%</td>
<td class="xl73" align="right" x:num="2.e-002">2%</td>
<td class="xl73" align="right" x:num="2.e-002">2%</td>
<td class="xl73" align="right" x:num="0.92000000000000004">92%</td>
</tr>
10MB file contains some random functions as in
https://gist.github.com/anonymous/f979595cf169575aea3b94d9abc3b525
You can try to create xlsx files. It should has smaller size.
Or if you stuck with coding, you can try to use report generators, like a Crystal or FastReport.Net (it has an exports in xls and xlsx too).
Related
I am trying to convert Html to PDF. I am using iTextSharp. I found that iTextSharp does not support CSS well. Infact I think HtmlWorker thread does not support it all. To compound my problem iTextSharp does not seem to support RowSpan either.
This is what I am trying to generate: http://jsbin.com/jovugohuju/1/edit?html,output
<table border="1" width="700">
<tr>
<td colspan="5" align="center" bgcolor="lightblue">INVOICE</td>
</tr>
<tr>
<td colspan="2" rowspan="4" bgcolor="white"><b>AIRNET NETWORKS</b>
<br>asdadadadaada asd asd a ads adsadsadsadasd</td>
<td>INVOICE</td>
<td>DATE</td>
<td>aDATEsd</td>
</tr>
<tr>
<td>Order</td>
<td>XXXX</td>
<td>Ref XXXXXX</td>
</tr>
<tr>
<td>Delivery</td>
<td>XXXX</td>
<td>Ref XXXXXX</td>
</tr>
<tr>
<td>Due Date</td>
<td>XXXX</td>
<td>Ref XXXXXX</td>
</tr>
<tr>
<td colspan="2" rowspan="4" bgcolor="white">
<p><b>CUSTOMER NAME</b>
</p>asd asd adadaadadadada adadaadsasdad ada asd adad</td>
</tr>
<tr>
<td>Customer Care No:</td>
<td colspan="2">544646454,88877978975</td>
</tr>
<tr>
<td>Email Id</td>
<td colspan="2">airnet#gmail.com</td>
</tr>
<tr>
<td>Account Details</td>
<td colspan="2">5522245125545455 IFSC 323hasd</br>SBI India</td>
</tr>
</table>
<table border="1" width="700">
<tr>
<td bgcolor="lightblue" height="15">Srno</td>
<td bgcolor="lightblue">Particulars</td>
<td bgcolor="lightblue">Quantity</td>
<td bgcolor="lightblue">Rate/Month</td>
<td bgcolor="lightblue">Total Rupees</td>
<tr>
<td valign="top">1</td>
<td valign="top">1 MBPS Plan</td>
<td valign="top">1</td>
<td valign="top">600</td>
<td valign="top">692</td>
</tr>
</tr>
<tr>
<td height="300" valign="top">1</td>
<td valign="top">1 MBPS Plan</td>
<td valign="top">1</td>
<td valign="top">600</td>
<td valign="top">692</td>
</tr>
<tr>
<td colspan="3" rowspan="3" valign="top">asdasdasd</td>
<td colspan="1">Total</td>
<td colspan="1">692</td>
</tr>
<tr>
<td>Service Tax</td>
<td>692</td>
</tr>
<tr>
<td>Grand Total</td>
<td>692</td>
</tr>
</table>
C# CODE:
Document document = new Document();
document.SetPageSize(iTextSharp.text.PageSize.A4);
iTextSharp.text.pdf.draw.LineSeparator line1 = new iTextSharp.text.pdf.draw.LineSeparator(0f, 100f, iTextSharp.text.Color.BLACK, Element.ALIGN_LEFT, 1);
string NEWhtmlText="<table border='1' width='500' > <tr> <td bgcolor='lightblue' height='15' >Srno</td><td bgcolor='lightblue'>Particulars</td><td bgcolor='lightblue' >Quantity</td><td bgcolor='lightblue'>Rate/Month</td><td bgcolor='lightblue'>Total Rupees</td> </tr> <tr> <td valign='top' >1</td><td valign='top' >1 MBPS Plan</td><td valign='top'>1</td><td valign='top'>600</td><td valign='top'>692</td> </tr> <tr> <td height='300' valign='top' >1</td><td valign='top' >1 MBPS Plan</td><td valign='top'>1</td><td valign='top'>600</td><td valign='top'>692</td> </tr> <tr><td colspan='3' rowspan='3' valign='top'>asdasdasd</td><td colspan='1'>Total</td><td colspan='1'>692</td></tr> <tr><td>Service Tax</td><td>692</td></tr> <tr><td>Grand Total</td><td>692</td></tr> </table>";
PdfWriter.GetInstance(document, new FileStream(saveFileDialog1.FileName, FileMode.Create));
document.Open();
iTextSharp.text.html.simpleparser.HTMLWorker hw = new iTextSharp.text.html.simpleparser.HTMLWorker(document);
hw.Parse(new StringReader(NEWhtmlText));
document.Close();
OUTPUT(unwanted):
Please take a look at the following screen shot:
To the left, you see an HTML file rendered in a browser. To the right, you see that HTML file rendered to PDF using iText (the Java version). Note that the functionality of iTextSharp regarding HTML to PDF is identical to Java, hence you shouldn't post questions saying "does not work in iTextSharp" because that sounds as if iTextSharp can't achieve what you want to do (which is an incorrect allegation), whereas the actual problem is caused by some individual errors you made when writing your code. It is not friendly to blame a tool for your own errors ;-)
There are three reasons why your application doesn't work:
Your HTML doesn't make sense. I had to clean it up (change <br> into <br />, introduce the correct CSS, correct the column-count for some rows,...) and make it XHTML before it rendered correctly in a browser. You can find the HTML that was used in the screenshot here: table2_css.html
You are using HTMLWorker instead of XML Worker, and you are right: HTMLWorker has no support for CSS. Saying CSS doesn't work in iTextSharp is wrong. It doesn't work when you use HTMLWorker, but that's documented: the CSS you need works in XML Worker.
You are probably using an old version of iTextSharp, and you are right: CSS and table support wasn't as good as in older versions of iTextSharp when compared to the most recent version.
See the XML Worker page on the official iText site for more info. Apart from iTextSharp, you also need to download XML Worker. The examples are written in Java, but you should have no problem converting them to C#. The example I used to make the PDF in the screen shot (html_table_4.pdf) can be found here: ParseHtmlTable4
public void createPdf(String file) throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream(HTML));
// step 5
document.close();
}
I am writing a C# script in selenium websdriver to test a website and got stuck in between. I have a multiple URL's on a webpage and I need to find one URL out of them and apply a click event on it. The URL names are dynamic hence its difficult for me to find it using 'ID' or 'name'. The URL name can be anything for example: the_rise_of_India.htm or the_rise_india.htm or the_riseIndia.htm etc... How can I filter and find my desired URL and click on it?
Kindly help me on the above scenario.
My HTML code is:
<table xmlns:downloader="http://schemas.niku.com/java/com.niku.dms.web.ZipPageDownloader" border="0" cellspacing="1" cellpadding="2" class="tableGridList" width="100%">
<tbody>
<tr class="TableGridList"><td class="ColHeadNoSort" scope="col"><img src="ui/evolution1/images/IcoCheckAll.gif" border="0" alt="Check All" title="Check All"></td><td class="ColHeadNoSort" colspan="3" align="center" scope="col">Name</td><td class="ColHeadNoSort" align="center" scope="col"> </td><td class="ColHeadNoSort" scope="col">Size</td><td class="ColHeadNoSort" scope="col">Type</td><td class="ColHeadNoSort" scope="col">Status</td><td class="ColHeadNoSort" scope="col">Modified</td><td class="ColHeadNoSort" scope="col">Actions</td></tr>
<tr valign="top" class="rowOff" onmouseover="this.className='rowOn'" onmouseout="this.className='rowOff'"><td valign="middle" align="center" width="20" scope="row"> </td><td valign="middle" width="20" class="hierTee"><img src="ui/evolution1/images/Spacer.gif" height="16" width="16" border="0" alt=""></td><td valign="middle"><img src="ui/evolution1/images/fmFolderClosed.gif" alt="" title=""></td><td valign="middle" width="100%"><a class="tableLink" target="" href="app?action=dms.ProjectsfileManager&folderId=5697033&returnAction=dms.ProjectsFileManager&cancelAction=dms.ProjectsFileManager&actionItemId=&id=5103184&type=Projects&taskID=&fromPage=&rootFolderId=&">Great India's place</a></td><td valign="middle" align="left" nowrap="true" id="PPP"></td><td valign="middle" align="left" nowrap="true"></td><td valign="middle" align="right" nowrap="true"></td><td valign="middle" align="right" nowrap="true"></td><td valign="middle" align="center" nowrap="true">9/26/15 8:27 PM</td><td valign="middle" align="right"><select name="folder5697033" onchange="optionGoTo( this.form.name,'folder5697033')" class="docMgrAction"><option value=""></option></select></td></tr>
<tr valign="top" class="rowOff" onmouseover="this.className='rowOn'" onmouseout="this.className='rowOff'"><td valign="middle" align="center" width="20" scope="row"> </td><td valign="middle" width="20" class="hierTee"><img src="ui/evolution1/images/Spacer.gif" height="16" width="16" border="0" alt=""></td><td valign="middle"><img src="ui/evolution1/images/fmFolderClosed.gif" alt="" title=""></td><td valign="middle" width="100%"><a class="tableLink" target="" href="app?action=dms.ProjectsfileManager&folderId=5687045&returnAction=dms.ProjectsFileManager&cancelAction=dms.ProjectsFileManager&actionItemId=&id=5103184&type=Projects&taskID=&fromPage=&rootFolderId=&">India's silver gold awards</a></td><td valign="middle" align="left" nowrap="true" id="PPP"></td><td valign="middle" align="left" nowrap="true"></td><td valign="middle" align="right" nowrap="true"></td><td valign="middle" align="right" nowrap="true"></td><td valign="middle" align="center" nowrap="true">8/6/15 12:04 PM</td><td valign="middle" align="right"><select name="folder5687045" onchange="optionGoTo( this.form.name,'folder5687045')" class="docMgrAction"><option value=""></option></select></td></tr>
<tr valign="top" class="rowOff" onmouseover="this.className='rowOn'" onmouseout="this.className='rowOff'"><td valign="middle" align="center" width="20" scope="row"> </td><td valign="middle" width="20" class="hierTee"><img src="ui/evolution1/images/Spacer.gif" height="16" width="16" border="0" alt=""></td><td valign="middle"><img src="ui/evolution1/images/fmFolderClosed.gif" alt="" title=""></td><td valign="middle" width="100%"><a class="tableLink" target="" href="app?action=dms.ProjectsfileManager&folderId=5693965&returnAction=dms.ProjectsFileManager&cancelAction=dms.ProjectsFileManager&actionItemId=&id=5103184&type=Projects&taskID=&fromPage=&rootFolderId=&">India's Gold awards</a></td><td valign="middle" align="left" nowrap="true" id="PPP"></td><td valign="middle" align="left" nowrap="true"></td><td valign="middle" align="right" nowrap="true"></td><td valign="middle" align="right" nowrap="true"></td><td valign="middle" align="center" nowrap="true">9/8/15 10:02 AM</td><td valign="middle" align="right"><select name="folder5693965" onchange="optionGoTo( this.form.name,'folder5693965')" class="docMgrAction"><option value=""></option></select></td></tr>
<tr valign="top" class="rowOff" onmouseover="this.className='rowOn'" onmouseout="this.className='rowOff'"><td valign="middle" align="center" width="20" scope="row"> </td><td valign="middle" width="20" class="hierTee"><img src="ui/evolution1/images/Spacer.gif" height="16" width="16" border="0" alt=""></td><td valign="middle"><img src="ui/evolution1/images/fmFolderClosed.gif" alt="" title=""></td><td valign="middle" width="100%"><a class="tableLink" target="" href="app?action=dms.ProjectsfileManager&folderId=5691948&returnAction=dms.ProjectsFileManager&cancelAction=dms.ProjectsFileManager&actionItemId=&id=5103184&type=Projects&taskID=&fromPage=&rootFolderId=&">Awards night - India</a></td><td valign="middle" align="left" nowrap="true" id="PPP"></td><td valign="middle" align="left" nowrap="true"></td><td valign="middle" align="right" nowrap="true"></td><td valign="middle" align="right" nowrap="true"></td><td valign="middle" align="center" nowrap="true">8/28/15 7:30 AM</td><td valign="middle" align="right"><select name="folder5691948" onchange="optionGoTo( this.form.name,'folder5691948')" class="docMgrAction"><option value=""></option></select></td></tr>
</tbody>
</table>
The URL names are dynamic and I need to search the string and click on it.
SO if I understand your question properly , You just need to get the list of all the displayed URLS and click on your desired one.
1 st easy way.
Webelement element = driver.findElement(By.xpath("//*[text()='the_right_url.htm']"));
you can put a check here if you want to verify the URL is displayed on the page
by
if(element!=null)
element.click()
else
//test fails
the 2nd way you can get the list of URLS which are displayed on the page and then click on the desired one by index number
Webelements [] ele = driver.findElements(By.xpath(//*/li)) //let say the URLs are coming under li tags
now you can navigate through the list ele using loop may be for loop and get the index number(using come count let say count) where your desired url is displaying.
and then click on it using
driver.findelemenet(By.xpath(//*/li[count]));
If you have text that you know will always appear in the link you can find it by partial text
driver.FindElement(By.PartialLinkText("india"));
Or using contains
driver.FindElement(By.Xpath("//[contains(text(), 'india')]"));
Please note both options are case sensitive.
in a page there is some tags like this:
<tr class=" ev_modern">
<td align="left" valign="middle" title="1">1</td>
<td align="left" valign="middle" title="09:00:08" class="">09:00:08</td>
<td align="left" valign="middle">3000</td>
<td align="left" valign="middle" title="2539.00">2539.00</td>
</tr>
I looking for(row number, time, number1, number2) each of them, and tried many ways but I couldn't, one of them:
var elements = driver.FindElements(By.CssSelector("[class=' ev_modern']"));
elements is empty
how get information by selenium using C#?
Use following css selector to get to the td elements:
By.CssSelector("[class=' ev_modern'] td")
Then you can iterate through the list of elements.
Or if you want to get text from first row, do following:
driver.FindElement(By.CssSelector("[class=' ev_modern'] td:nth-child(1)")).Text;
Hope this helps.
I am trying to convert Html to PDF. I am using iTextSharp. I found that iTextSharp does not support CSS well. Infact I think HtmlWorker thread does not support it all. To compound my problem iTextSharp does not seem to support RowSpan either.
This is what I am trying to generate: http://jsbin.com/jovugohuju/1/edit?html,output
<table border="1" width="700">
<tr>
<td colspan="5" align="center" bgcolor="lightblue">INVOICE</td>
</tr>
<tr>
<td colspan="2" rowspan="4" bgcolor="white"><b>AIRNET NETWORKS</b>
<br>asdadadadaada asd asd a ads adsadsadsadasd</td>
<td>INVOICE</td>
<td>DATE</td>
<td>aDATEsd</td>
</tr>
<tr>
<td>Order</td>
<td>XXXX</td>
<td>Ref XXXXXX</td>
</tr>
<tr>
<td>Delivery</td>
<td>XXXX</td>
<td>Ref XXXXXX</td>
</tr>
<tr>
<td>Due Date</td>
<td>XXXX</td>
<td>Ref XXXXXX</td>
</tr>
<tr>
<td colspan="2" rowspan="4" bgcolor="white">
<p><b>CUSTOMER NAME</b>
</p>asd asd adadaadadadada adadaadsasdad ada asd adad</td>
</tr>
<tr>
<td>Customer Care No:</td>
<td colspan="2">544646454,88877978975</td>
</tr>
<tr>
<td>Email Id</td>
<td colspan="2">airnet#gmail.com</td>
</tr>
<tr>
<td>Account Details</td>
<td colspan="2">5522245125545455 IFSC 323hasd</br>SBI India</td>
</tr>
</table>
<table border="1" width="700">
<tr>
<td bgcolor="lightblue" height="15">Srno</td>
<td bgcolor="lightblue">Particulars</td>
<td bgcolor="lightblue">Quantity</td>
<td bgcolor="lightblue">Rate/Month</td>
<td bgcolor="lightblue">Total Rupees</td>
<tr>
<td valign="top">1</td>
<td valign="top">1 MBPS Plan</td>
<td valign="top">1</td>
<td valign="top">600</td>
<td valign="top">692</td>
</tr>
</tr>
<tr>
<td height="300" valign="top">1</td>
<td valign="top">1 MBPS Plan</td>
<td valign="top">1</td>
<td valign="top">600</td>
<td valign="top">692</td>
</tr>
<tr>
<td colspan="3" rowspan="3" valign="top">asdasdasd</td>
<td colspan="1">Total</td>
<td colspan="1">692</td>
</tr>
<tr>
<td>Service Tax</td>
<td>692</td>
</tr>
<tr>
<td>Grand Total</td>
<td>692</td>
</tr>
</table>
C# CODE:
Document document = new Document();
document.SetPageSize(iTextSharp.text.PageSize.A4);
iTextSharp.text.pdf.draw.LineSeparator line1 = new iTextSharp.text.pdf.draw.LineSeparator(0f, 100f, iTextSharp.text.Color.BLACK, Element.ALIGN_LEFT, 1);
string NEWhtmlText="<table border='1' width='500' > <tr> <td bgcolor='lightblue' height='15' >Srno</td><td bgcolor='lightblue'>Particulars</td><td bgcolor='lightblue' >Quantity</td><td bgcolor='lightblue'>Rate/Month</td><td bgcolor='lightblue'>Total Rupees</td> </tr> <tr> <td valign='top' >1</td><td valign='top' >1 MBPS Plan</td><td valign='top'>1</td><td valign='top'>600</td><td valign='top'>692</td> </tr> <tr> <td height='300' valign='top' >1</td><td valign='top' >1 MBPS Plan</td><td valign='top'>1</td><td valign='top'>600</td><td valign='top'>692</td> </tr> <tr><td colspan='3' rowspan='3' valign='top'>asdasdasd</td><td colspan='1'>Total</td><td colspan='1'>692</td></tr> <tr><td>Service Tax</td><td>692</td></tr> <tr><td>Grand Total</td><td>692</td></tr> </table>";
PdfWriter.GetInstance(document, new FileStream(saveFileDialog1.FileName, FileMode.Create));
document.Open();
iTextSharp.text.html.simpleparser.HTMLWorker hw = new iTextSharp.text.html.simpleparser.HTMLWorker(document);
hw.Parse(new StringReader(NEWhtmlText));
document.Close();
OUTPUT(unwanted):
Please take a look at the following screen shot:
To the left, you see an HTML file rendered in a browser. To the right, you see that HTML file rendered to PDF using iText (the Java version). Note that the functionality of iTextSharp regarding HTML to PDF is identical to Java, hence you shouldn't post questions saying "does not work in iTextSharp" because that sounds as if iTextSharp can't achieve what you want to do (which is an incorrect allegation), whereas the actual problem is caused by some individual errors you made when writing your code. It is not friendly to blame a tool for your own errors ;-)
There are three reasons why your application doesn't work:
Your HTML doesn't make sense. I had to clean it up (change <br> into <br />, introduce the correct CSS, correct the column-count for some rows,...) and make it XHTML before it rendered correctly in a browser. You can find the HTML that was used in the screenshot here: table2_css.html
You are using HTMLWorker instead of XML Worker, and you are right: HTMLWorker has no support for CSS. Saying CSS doesn't work in iTextSharp is wrong. It doesn't work when you use HTMLWorker, but that's documented: the CSS you need works in XML Worker.
You are probably using an old version of iTextSharp, and you are right: CSS and table support wasn't as good as in older versions of iTextSharp when compared to the most recent version.
See the XML Worker page on the official iText site for more info. Apart from iTextSharp, you also need to download XML Worker. The examples are written in Java, but you should have no problem converting them to C#. The example I used to make the PDF in the screen shot (html_table_4.pdf) can be found here: ParseHtmlTable4
public void createPdf(String file) throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream(HTML));
// step 5
document.close();
}
I have a template where I want to replace certain regions. In my example below, I want to extract the regions between the ... comments, manipulate it, then replace them back after the manipulation.
I do not need the logic to merge the fields, but I need to extract the regions so I can use my logic and place it back into the template.
Does anyone know of an elegant or simple way to extract these regions? I am also hoping to extract the url values in the process as well if it is easy to do along the way.
<table width="700" border="0" align="center" cellpadding="4" cellspacing="0">
<tr>
<td align="center" valign="top">
<!--DynamicSlotStart url="http://www.test.com/itemdisplay0_10751_-1_57436_10001"-->
<table>
<tbody>
<tr>
<td><p><a title="[element='title']" href="[url]"><img border="0" alt="[element='title']" src="[element='photo' property='src' maxwidth='135']" width="135" height="135" /></a></p></td>
</tr>
<tr>
<td><span>[element='h1']</span></td>
</tr>
<tr>
<td><span><strong>[element='price']<br />
</strong></span><span>[element='was_price']</span></td>
</tr>
<tr>
<td><span><a title="[element='title']" href="[url]">Details</a></span></td>
</tr>
</tbody>
</table>
<!--DynamicSlotFinish-->
</td>
<td align="center" valign="top">
<!--DynamicSlotStart url="http://www.test.com/itemdisplay0_10751_-1_3379_10001"-->
<table>
<tbody>
<tr>
<td><p><a title="[element='title']" href="[url]"><img border="0" alt="[element='title']" src="[element='photo' property='src' maxwidth='135']" width="135" height="135" /></a></p></td>
</tr>
<tr>
<td><span>[element='h1']</span></td>
</tr>
<tr>
<td><span><strong>[element='price']<br />
</strong></span><span>[element='was_price']</span></td>
</tr>
<tr>
<td><span><a title="[element='title']" href="[url]">Details</a></span></td>
</tr>
</tbody>
</table>
<!--DynamicSlotFinish-->
</td>
<td align="center" valign="top">
<!--DynamicSlotStart url="http://www.test.com/itemdisplay0_10751_-1_104854_10001"-->
<table>
<tbody>
<tr>
<td><p><a title="[element='title']" href="[url]"><img border="0" alt="[element='title']" src="[element='photo' property='src' maxwidth='135']" width="135" height="135" /></a></p></td>
</tr>
<tr>
<td><span>[element='h1']</span></td>
</tr>
<tr>
<td><span><strong>[element='price']<br />
</strong></span><span>[element='was_price']</span></td>
</tr>
<tr>
<td><span><a title="[element='title']" href="[url]">Details</a></span></td>
</tr>
</tbody>
</table>
<!--DynamicSlotFinish-->
</td>
<td align="center" valign="top">
<!--DynamicSlotStart url="http://www.test.com/itemdisplay0_10751_-1_80977_10001"-->
<table>
<tbody>
<tr>
<td><p><a title="[element='title']" href="[url]"><img border="0" alt="[element='title']" src="[element='photo' property='src' maxwidth='135']" width="135" height="135" /></a></p></td>
</tr>
<tr>
<td><span>[element='h1']</span></td>
</tr>
<tr>
<td><span><strong>[element='price']<br />
</strong></span><span>[element='was_price']</span></td>
</tr>
<tr>
<td><span><a title="[element='title']" href="[url]">Details</a></span></td>
</tr>
</tbody>
</table>
<!--DynamicSlotFinish-->
</td>
</tr>
</table>
Maybe this project will be helpful: Html Agility Pack
What is exactly the Html Agility Pack (HAP)?
This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).
Html Agility Pack now supports Linq to Objects (via a LINQ to Xml Like interface). Check out the new beta to play with this feature
Sample applications:
Page fixing or generation. You can
fix a page the way you want, modify
the DOM, add nodes, copy nodes,
well... you name it.
Web scanners.
You can easily get to img/src or
a/hrefs with a bunch XPATH queries.
Web scrapers. You can easily scrap
any existing web page into an RSS
feed for example, with just an XSLT
file serving as the binding. An
example of this is provided.