How can I get text from HTML code like this:
<tr class="tabelaZbiorczaAltRow"><td nowrap="">
<a href='javascript:danePobierzPelnyRaport("890002604","DaneRaportPrawnaPubl", 0);'>890002604</a>
</td><td nowrap="">P</td>
<td>GAZA A.A.GĄSIEWICZ SPÓŁKA JAWNA</td>
<td nowrap="">DOLNOŚLĄSKIE</td>
<td nowrap="">kłodzki</td>
<td nowrap="">Duszniki-Zdrój</td>
<td nowrap="">57-340</td>
<td nowrap="">Duszniki-Zdrój</td>
<td nowrap="">ul. Willowa 1</td>
<td nowrap="">----------</td>
</tr>
I need all texts in td nowrap tags I have to distinguish each of them.
It may help you
public static string StripHTML(string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}
Related
I am new to web api ,Here I am sending mail using SMTP service .
In this I need to send images I mean I need to display the image along with the mail contents.
But in my case contents are properly displaying but the image is not .
The data of image URL is : http://192.168.168.62:8087/Images/Product/My_Cart/axe_brand_1.jpg
When I inspect the image in the mail box it shows as follows:
<img data-imagetype="External" src="/actions/ei?u=http%3A%2F%2F192.168.168.62%3A8087%2FImages%2FProduct%2FMy_Cart%2Faxe_brand_1.jpg&d=2019-01-02T05%3A52%3A02.212Z" originalsrc="http://192.168.168.62:8087/Images/Product/My_Cart/axe_brand_1.jpg" data-connectorsauthtoken="1" data-imageproxyendpoint="/actions/ei" data-imageproxyid="">
I have tried many ways to solve this I can't .Can anyone help me to solve this .
Update :
c#:
foreach (DataRow Row in dt.Tables[0].Rows)
{
LinkedResource res = new LinkedResource(Row["ITEM_IMAGE"].ToString());
res.ContentId = Guid.NewGuid().ToString();
string htmlBody = #"<img src='cid:" + res.ContentId + #"'/>";
tableRows.AppendFormat(mailBodyTemplate, htmlBody, Row["ITEM_NAME"], Row["ITEM_UOM"], Row["QUANTITY"], Row["ITEM_PRICE_WITH_GST"], Row["TOTAL_AMOUNT_WITH_GST"]);
}
var mailBody = string.Format(PurchaseSummary, tableRows.ToString(), totalPrice);
Template :
<tr>
<td align="center"><img src="{0}" /></td>
<td>
<table style="width:100%;margin-left:15px">
<tr>
<td align="right" style="width:50%">Product Name :</td>
<td align="left" style="color:#32CD32;font-weight:bold;width:50%">{1}</td>
</tr>
<tr>
<td align="right" style="width:50%">Product UOM :</td>
<td align="left" style="color:#696969;font-weight:bold;width:50%">{2}</td>
</tr>
<tr>
<td align="right" style="width:50%">Quantity :</td>
<td align="left" style="color:#778899;font-weight:bold;width:50%">{3}</td>
</tr>
<tr>
<td align="right" style="width:50%">Unit Price :</td>
<td align="left" style="color:#483D8B;font-weight:bold;width:50%">{4}
<td>
</tr>
</table>
</td>
<td align="center" style="font-size:20px;font-weight:bold;color:#1E90FF">{5}</td>
</tr>
still it's not displaying the image .
While inspecting the image I got as follows..
<img data-imagetype="External" src="/actions/ei?u=http%3A%2F%2F192.168.168.62%3A8087%2FImages%2FProduct%2FMy_Cart%2Faxe_brand_1.jpg&d=2019-01-02T09%3A41%3A12.195Z" originalsrc="http://192.168.168.62:8087/Images/Product/My_Cart/axe_brand_1.jpg" data-connectorsauthtoken="1" data-imageproxyendpoint="/actions/ei" data-imageproxyid="">
I have the following HTML:
<tbody>
<tr>
<td class="metadata_name">Headquarters</td>
<td class="metadata_content">Princeton New Jersey, United States</td>
</tr>
<tr>
<td class="metadata_name">Industry</td>
<td class="metadata_content"><ul><li>Engineering Software</li><li>Software Development & Design</li><li>Software</li><li>Custom Software & Technical Consulting</li></ul></td>
</tr>
<tr>
<td class="metadata_name">Revenue</td>
<td class="metadata_content">$17.5 Million</td>
</tr>
<tr>
<td class="metadata_name">Employees</td>
<td class="metadata_content">201 to 500</td>
</tr>
<tr>
<td class="metadata_name">Links</td>
<td class="metadata_content"><ul><li>Company website</li></ul></td>
</tr>
</tbody>
I want to be able to load the metadata_content value (ex "$17.5 Million") in to a var where the metadata_name is = to a value (ex: "Revenue").
I have tried to use combinations of code like this for a few hours...
orgHtml.DocumentNode.SelectNodes("//td[#class='metadata_name']")[0].InnerHtml;
But I'm not getting the right combination down. If you have a helpful SelectNodes syntax - that will get me the solution I would appreciate it.
It seems what you're looking for is this:
var found = orgHtml.DocumentNode.SelectSingleNode(
"//tr[td[#class = 'metadata_name'] = 'Revenue']/td[#class = 'metadata_content']");
if (found != null)
{
string html = found.InnerHtml;
// use html
}
Note that to get the text of an element, you should use found.InnerText, not found.InnerHtml, unless you specifically need its HTML content.
How to read <table> into onmouseover event with C# and HTMLAgilityPack?
markup code :
<a href="#" class="chan_live_not_free" onclick="return false;" onmouseover="return overlib('
<table>
<tr class=fieldRow>
<td class=posH_col width=40>
<strong>pos</strong>
</td>
<td class=rest_col width=90>
<strong>satellite</strong>
</td>
<td class=freqH_col width=50>
<strong>freq</strong>
</td>
<td class=rest_col width=90>
<strong>symbol</strong>
</td>
<td class=rest_col width=90>
<strong>encryption</strong>
</td>
</tr>
<tr>
<td class="pos_col">39.0°e</td>
<td class=rest_col>Hellas Sat 2</td>
<td class="freq_col">12.606 H</td>
<td class=rest_col>30000 - 2/3</td>
<td class=enc_not_live>MPEG-4 BulCrypt</td>
</tr>
</table>',CAPTION, 'Arena Sport 4 (serbia) – 19/10/14 - 11:30');" onmouseout="return nd();">
Arena Sport 4 (serbia)
</a>
I need to read the table into onmouseover event. How does it read?
You could get the element attribute of the <a> tag with HTML Agility Pack and then using regular expressions get the <table> inside the string, something like the following code :
var html = #"<a href='#' class='chan_live_not_free' onclick='return false;' onmouseover='return overlib(
<table>
<tr class=fieldRow>
<td class=posH_col width=40>
<strong>pos</strong>
</td>
<td class=rest_col width=90>
<strong>satellite</strong>
.
.
.
<tr>
<td class="pos_col">39.0°e</td>
<td class=rest_col>Hellas Sat 2</td>
<td class="freq_col">12.606 H</td>
<td class=rest_col>30000 - 2/3</td>
<td class=enc_not_live>MPEG-4 BulCrypt</td>
</tr>
</table>,CAPTION, 'Arena Sport 4 (serbia) – 19/10/14 - 11:30');' onmouseout='return nd();'>
Arena Sport 4 (serbia)
</a>";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var value = doc.DocumentNode.SelectSingleNode("//a[#class='chan_live_not_free']").Attributes["onmouseover"].Value;
var text = Regex.Matches(value, #"<table>([^)]*)</table>")[0].Value;
I have an HTML table that I request from a webserver in C#. I am then displaying the page in my aspx webform. How can I add a prerequisite based on the course ID to the last column in the table without hard-coding the prerequisite? Example of the table design is below.
<tr bgcolor="#E1E1CC">
<td width="7%">003597</td>
<td width="5%">01</td>
<td width="1%">OPT</td>
<td width="8%">MT H </td>
<td width="16%">2:00 pm - 2:50 pm </td>
<td width="17%">08/26/13 - 12/12/13</td>
<td width="8%">
<a href="http://www.mnsu.edu/registrar/building.html"target = _blank>
<b>TR C124</b>
</a>
</td>
<td width="19%">Staff</td>
<td width="4%">22</td>
<td width="4%">6</td>
<td width="4%"><font color="#000000">Open</font></td>
<td width="7%">
<a href=Notes.asp?SpclNote=20143+003597+IT+100 target = _blank>
<b>Notes</b>
</a>
</td>
</tr>
<tr bgcolor="#E1E1CC">
<td colspan="3"> </td>
<td width="8%">M </td>
<td width="16%">10:00 am - 11:50 am</td>
<td width="17%">08/26/13 - 12/09/13</td>
<td width="8%">
<a href="http://www.mnsu.edu/registrar/building.html"target = _blank>
<b>WH 0119 </b>
</a>
</td>
<td width="19%">Staff</td>
<td colspan="4"> </td>
</tr>
If you are getting the page html as a string you could just insert your html into it. Something like:
private void SetValue(string PageHtml, string ID, string TextToInsert)
{
string html = PageHtml;
string sMyHtmlToInsert = TextToInsert;
int iSplitIndex = html.IndexOf(ID);
iSplitIndex = html.IndexOf("{tag}",iSplitIndex);
string sHtml1 = html.SubString(0, iSplitIndex);
string sHtml2 = html.SubString(iSplitIndex);
string sFinalHtml = sHtml1 + sMyHtmlToInsert + sHtml2;
}
I need to grab a timesheets from a website. I want to store/add this timesheet to a data table in my C# Application.
The structure of the data table looks like this:
1. | Day | Time | Status |
2. ..1.......7:00.........IN
3. ..1.......9:45.......OUT
4. ..1......10:15........IN
5. ..1......15:45......OUT
6. ..1.......8:45.....TOTAL
7. ..2 .. ..
My C# code for the DataTable:
DataTable table = new DataTable("Worksheet");
table.Columns.Add("Day");
table.Columns.Add("Time");
table.Columns.Add("Status");
I tried different variants and I always mess up with all the data.
For testing purpose I made a new Winform with a "textbox" (for the sitepath) and "button"(to start the process)
Then I want HTMLAgilityPack to get all the data. one example:
public string[] GREYsource;
public Form1()
{
InitializeComponent();
}
private void btnSubmit_Click(object sender, EventArgs e)
{
var doc = new HtmlAgilityPack.HtmlDocument();
var fileName = txtPath.Text; // I downloaded the HTML-File
doc.Load(fileName);
string strGREYInner;
foreach (HtmlNode td in doc.DocumentNode.SelectNodes("//tr[#class=\"tblDataGreyNH\"]"))
{
strGREYInner = td.InnerText.Trim();
string shorted = strGREYInner.Replace("\t", ""); string shorted2 = shorted.Replace("\n\n\n\n", "\n\n\n"); string shorted3 = shorted2.Replace("\n\n\n", "\n\n"); string shorted4 = shorted3.Replace("\n\n", "\n");
GREYsource = shorted4.Split(new Char[] { '\n', });
}
foreach (string str in GREYsource)
{
...
}
}
Problem: the result contains a lot of tabs(/t) and newlines(/n) I need to trim.
Problem: This isn't a good way to do it, IMO. And this would just grab the Totaltimes.
It can be done better.
This is just a example I tried (other codes just went a pile of junk)
I attached the HTML-structure below:
Overview(picture):
A bit more in depth:
<html>
<head>
</head>
<style type="text/css">
</style>
<body id="body" onload="handleMenuOverlapLogo();onload_column_expand();;firstElementFocus();">
<.. some (java)scripts> /* has to be ignoered. not necessary */
<.. some other divs> /* has to be ignoered. not necessary */
<div id="rowContent"> /* This <div> contains the content i need */
<div id="titleTab"> /* Title is not necessary */
</div>
<div id="rowContentInner"> /* Here the content starts */
<table class="tblList">
<tbody>
<tr> /* not necessary */
<tr class="tblHeader"> /* not necessary */
<tr class="tblHeader"> /* not necessary */
<tr class="tblDataWhiteNH"> /* IN : */
<td class="tblHeader" style="font-weight: bold; text-align: right"> In </td>
<td nowrap=""> /* "tblDataWhiteNH" always contains 7 "td nowrap"
<td nowrap="">
<td nowrap=""> /* Example: if it contains a value */
<table width="100%" border="0" align="center">
<tbody>
<tr>
<td width="25%" align="left"> </td>
<td nowrap="" width="50%" align="center"> 7:53 </td> /* value = 7:53 (THIS!) */
<td width="25%" align="right"> </td>
</tr>
</tbody>
</table>
</td>
<td nowrap="">
<td nowrap=""> /* Example: if it contains no value */
<table width="100%" border="0" align="center">
<tbody>
<tr>
<td width="25%" align="left"> </td>
<td nowrap="" width="50%" align="center"> /* no value = 0:00 (THIS!) */
<td width="25%" align="right"> </td>
</tr>
</tbody>
</table>
</td>
<td nowrap="">
<td nowrap="">
<tr class="tblDataWhiteNH"> /* OUT : */
<td class="tblHeader" style="font-weight: bold; text-align: right"> Out </td>
<td nowrap=""> /* "tblDataWhiteNH" always contains 7 "td nowrap".
<td nowrap="">
<td nowrap=""> /* Example: if it contains a value */
<table width="100%" border="0" align="center">
<tbody>
<tr>
<td width="25%" align="left"> </td>
<td nowrap="" width="50%" align="center"> 7:53 </td> /* value = 7:53 (THIS!) */
<td width="25%" align="right"> </td>
</tr>
</tbody>
</table>
</td>
<td nowrap="">
<td nowrap=""> /* Example: if it contains no value */
<table width="100%" border="0" align="center">
<tbody>
<tr>
<td width="25%" align="left"> </td>
<td nowrap="" width="50%" align="center"> /* no value = 0:00 (THIS!) */
<td width="25%" align="right"> </td>
</tr>
</tbody>
</table>
</td>
<td nowrap="">
<td nowrap="">
<tr class="tblDataGreyNH"> /* IN : */
<tr class="tblDataGreyNH"> /* OUT : */
... /* "tblDataGreyNH" is built up the same way like "tblDataWhiteNH".
... /* sometimes there could be more "tblDataWhiteNH" and "tblDataGreyNH". */
... /* Usally there are just the "tblDataWhiteNH"(IN/OUT) */
<tr class="tblHeader"> /* not necessary */
/* It continues f.egs. with "tblDataWhite" if the last above header was a "tblDatagrey" */
/* and versa vice ("grey" if there was a "white" before.) */
<tr class="tblDataWhiteNH"> /* Worked : */
<td class="tblHeader" style="font-weight: bold; text-align: right"> Total Time </td>
<td> 07:47 </td> /* value = 7:47 (THIS!) */
<td> 04:48 </td>
<td> 00:00 </td> /* no value = 0:00 (THIS!) */
<td> 00:00 </td>
<td> 07:42 </td>
<td> 00:00 </td>
<td> 00:00 </td>
</tr>
<tr class="tblDataGreyNH"> /* Total : */
<td class="tblHeader" style="font-weight: bold; text-align: right"> Regular Time </td>
<td> 07:47 </td> /* value = 7:47 (THIS!) */
<td> 04:48 </td>
<td> </td> /* no value = 0:00 (THIS!) */
<td> </td>
<td> 07:42 </td>
<td> </td>
<td> </td>
</tr>
<tr class="tblHeader"> /* not necessary */
<tr valign="top"> /* not necessary */
</tbody>
</table>
</div>
</div>
</body>
</html>
a copy of the original HTML: http://time.wnb.dk/123/
I Hope anyone could help me get this to work.
Okay let me explain it with a picture. https://www.abload.de/img/eeeqnuwu.png
On the Picture you see the website + a table below, how the result should look like.
Declaring the Datatable isnt the problem.
The main problem is I can't get htmlagility to spit out right results and if it did, its almost buggy.
Some of the selectnodes I tried got the output messed up after a while. As yet I wasn't able to get "all" data from the table on the website, just some values, but often buggy.
So I'm actually searching for someone who could take a look on this and maybe help me to find the right selectnodes.
Not sure I fully understand what you want to do but here is a sample code that should help you get started. I strongly suggest you have a look at XPATH to understand it.
HtmlDocument doc = new HtmlDocument();
doc.Load(yourFile);
// get all TR with a specific class name, starting from root (/), and recursively (//)
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//tr[#class='tblDataGreyNH' or #class='tblDataWhiteNH']"))
{
// get all TD below the current node with a specific class name
HtmlNode inOrOut = node.SelectSingleNode("td[#class='tblHeader']");
if (inOrOut != null)
{
string io = inOrOut.InnerText.Trim();
Console.WriteLine(io.ToUpper());
if (io.Contains("Time"))
{
// normalize-space gets rid or whitespaces (\r,\n, etc.)
// text() gets the node's inner text
foreach (HtmlNode td in node.SelectNodes("td[normalize-space(#class)='' and normalize-space(text())!='' and normalize-space(text())!='00:00']"))
{
Console.WriteLine("value:" + td.InnerText.Trim());
}
}
}
// gets all TD below the current node that define the NOWRAP attribute
HtmlNodeCollection tdNoWraps = node.SelectNodes("td[#nowrap]");
if (tdNoWraps != null)
{
foreach (HtmlNode tdNoWrap in tdNoWraps)
{
string value = tdNoWrap.InnerText.Trim();
if (value == string.Empty)
continue;
Console.WriteLine("value:" + value);
}
}
}
It will output this from your sample page:
IN
value:7:47
value:7:46
value:7:45
value:7:51
OUT
value:15:35
value:15:33
value:12:38
value:8:59
IN
value:12:38
value:8:59
OUT
value:15:35
TOTAL TIME
value:07:48
value:07:47
value:07:50
value:01:08
REGULAR TIME
value:07:48
value:07:47
value:07:50
value:01:08