Remove Portion of HTML in EWS EmailMessage Body - c#

I would like to remove a portion of HTML content of EWS EmailMessage .Body.Text value before replying via ResponseMessage in C#
The content to remove are as follows, and they contain clickable and non-clikable html buttons.
I see that we can't declare our own custom tags, so I am unable to use string.replace by locating custom html tags.
May I know if there are workarounds for my task, such as placing the content below in a placeholder, etc?
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="100%" style="width:100.0%">
<tbody>
<tr>
<td style="padding:0in 0in 0in 0in">
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Times New Roman",serif"> <o:p></o:p></span></p>
</td>
<td width="30%" style="width:30.0%;background:#17202A;padding:4.5pt 4.5pt 4.5pt 4.5pt;display:inline-block">
<p class="MsoNormal" align="center" style="text-align:center"><b><span style="font-size:11.5pt;font-family:"Helvetica",sans-serif;color:white">ACTION: ADD
<o:p></o:p></span></b></p>
</td>
<td style="padding:0in 0in 0in 0in">
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Times New Roman",serif"> <o:p></o:p></span></p>
</td>
<td width="30%" style="width:30.0%;background:#17202A;padding:4.5pt 4.5pt 4.5pt 4.5pt;display:inline-block">
<p class="MsoNormal" align="center" style="text-align:center"><b><span style="font-size:11.5pt;font-family:"Helvetica",sans-serif;color:white">ACTION: MINUS
<o:p></o:p></span></b></p>
</td>
<td style="padding:0in 0in 0in 0in">
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Times New Roman",serif"> <o:p></o:p></span></p>
</td>
<td width="20%" style="width:20.0%;background:#3A69C2;padding:4.5pt 4.5pt 4.5pt 4.5pt;display:inline-block">
<p class="MsoNormal" align="center" style="text-align:center"><b><span style="font-size:11.5pt;font-family:"Helvetica",sans-serif;color:white"><a href="mailto:test.com?subject=[MULTIPLY];body=ACTION:%20MULTIPLY"><span style="color:white;background:#3A69C2;text-decoration:none">MULTIPLY
</span></a><o:p></o:p></span></b></p>
</td>
<td style="padding:0in 0in 0in 0in">
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Times New Roman",serif"> <o:p></o:p></span></p>
</td>
<td width="20%" style="width:20.0%;background:#3A69C2;padding:4.5pt 4.5pt 4.5pt 4.5pt;display:inline-block">
<p class="MsoNormal" align="center" style="text-align:center"><b><span style="font-size:11.5pt;font-family:"Helvetica",sans-serif;color:white"><a href="mailto:test.com?subject=[DIVIDE];body=ACTION:%20DIVIDE"><span style="color:white;background:#3A69C2;text-decoration:none">DIVIDE
</span></a><o:p></o:p></span></b></p>
</td>
<td style="padding:0in 0in 0in 0in">
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Times New Roman",serif"> <o:p></o:p></span></p>
</td>
</tr>
</tbody>
</table>

Could you use the Id attribute of the html tags https://www.w3schools.com/tags/att_id.asp to identify the tags that you want to remove ?

Assuming you want to remove the table containing the Multiply and Divide buttons and that they will not be found anywhere else in the document we can use those in combination with XPath to find the table and remove it. You could possible use the inbuild XMLDocument class but as this it HTML I'll recommend using HTMLAgilityPack (available as a nuget package) to parse the HTML.
You'll end up with something like:
//Create a HTMLAgilityPack Document
HtmlDocument doc = new HtmlDocument();
//Load the email body
doc.LoadHtml(EmailMessage.Body.Text);
//Select the ancestor table of the link we're interested in
HtmlNode node = doc.DocumentNode.SelectSingleNode("//a[#href='mailto:test.com?subject=[DIVIDE];body=ACTION:%20DIVIDE']//ancestor::table");
//Remove the table
node.Remove();
//Get the new email body
string newBody = doc.DocumentNode.InnerHtml;
You may need to tweak a little to get you there but hopefully this is a good start.

Related

No text returned for IWebElement.Text

I have some xpath which correctly returns me the nodes I am after but I cannot retrieve the text between the tags in Selenium.
The below correctly returns the 4 nodes I am after:
var subMenuItems = driver.FindElements(By.XPath("//div[#id='OpenandApply123']//a"));
However when I do the below I get nothing for text returned when I'm expecting "abc123":
string item1 = subMenuItems[0].Text;
How do I get the texts for abc123, def123 etc, returned?
Full html below, apologies for formatting:
<div class="subMenu" id="OpenandApply123" role="menu" aria-hidden="false" style="" xpath="1">
<table cellpadding="0" cellspacing="0" width="150" role="presentation">
<tbody><tr role="presentation">
<td role="presentation" class="">
<a id="abc.feature_link" href="abc/abc" class="menuItem" target="_top" style="background-position: 4px 2px;">
abc123
</a>
</td>
</tr>
<tr role="presentation">
<td role="presentation" class="">
<a id="def.feature_link" href="abc/Feature/def" class="menuItem" target="_top" style="background-position: 4px 2px;">
def123
</a>
</td>
</tr>
<tr role="presentation">
<td role="presentation">
<a id="ghi.feature_link" href="abc/Feature/ghi" class="menuItem" target="_top" style="">
ghi123
</a>
</td>
</tr>
<tr role="presentation">
<td role="presentation">
<a id="klm.feature_link" href="abc/klm" class="menuItem" target="_top" style="">
klm123
</a>
</td>
</tr>
</tbody></table>
</div>
I would suggest induce WebDriverWait and wait for VisibilityOfAllElementsLocatedBy() and following xpath.
WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(10));
wait.Until(SeleniumExtras.WaitHelpers.ExpectedConditions.VisibilityOfAllElementsLocatedBy(By.XPath("//td[#role='presentation']//a")));
var subMenuItems = driver.FindElements(By.XPath("//td[#role='presentation']//a"));
string item1 = subMenuItems[0].Text;
Or
string item1 = subMenuItems[0].GetAttribute("textContent");

Add tbody XML Element to table Element in XDcoument

Want to add <tbody> element in <table> elements if missing on Xdcoument.
<table class="newtable" id="item_559_Table1" cellpadding="0" cellspacing="0" data-its-style="width:11.4624em; border-spacing:0;">
<colgroup data-its-style="width:11.4624em; " />
<tr>
<td data-its-style="padding:0.2292em; vertical-align:top; ">
<p data-its-style="">My dad cooks up a pot of chicken soup, and</p>
</td>
</tr>
<tr>
<td data-its-style="padding:0.2292em; vertical-align:top; ">
<p data-its-style="font-weight:normal; ">This cold means I can’t taste a thing today!</p>
</td>
</tr>
</table>
Output should look like
<table class="newtable" id="item_559_Table1" cellpadding="0" cellspacing="0" data-its-style="width:11.4624em; border-spacing:0;">
<colgroup data-its-style="width:11.4624em; " />
<tbody>
<tr>
<td data-its-style="padding:0.2292em; vertical-align:top; ">
<p data-its-style="">My dad cooks up a pot of chicken soup, and</p>
</td>
</tr>
<tr>
<td data-its-style="padding:0.2292em; vertical-align:top; ">
<p data-its-style="font-weight:normal; ">This cold means I can’t taste a thing today!</p>
</td>
</tr>
</tbody>
</table>
**Not looking for XSLT solution.
One way to do it would be to grab the children of <table>, then add them back they way you want them.
var doc = XDocument.Load("file.xml");
var colgroup = doc.Root.Elements("colgroup");
var tr = doc.Root.Elements("tr");
// Add tr to tbody
var tbody = new XElement("tbody", tr);
// Replace the children of table with colgroup and tbody
doc.Root.ReplaceNodes(colgroup, tbody);

Failing to retrieve the third td nodes in an html list

I am trying to get the text "Very Good Country views" and "Good" using HTMLAgilityPack.
<div class="property-details-section">
<h5><span id="content_lblFurtherDetails">Further Details</span></h5>
<ul id="features">
<li style="display:block;">
<table border="0" cellpadding="0" cellspacing="0" width="500">
<tr>
<td style="width: 15px;">
<img src="../images/bullet.png" alt="bullet" />
</td>
<td style="width: 185px;">Views</td>
<td style="width: 300px;">Very Good Country views</td>
</tr>
</table>
</li>
</ul>
<li style="display:block;">
<table border="0" cellpadding="0" cellspacing="0" width="500">
<tr>
<td style="width: 15px;">
<img src="../images/bullet.png" alt="bullet" />
</td>
<td style="width: 185px;">Finish</td>
<td style="width: 300px;">Good</td>
<tr>
</table>
</li>
</div>
I have tried the following for "Very Good Country views" with no success:
HtmlNode text =
doc.DocumentNode.SelectSingleNode("//ul[#id='features']/li/table/tr/td[3]");
I am trying to get the text "Very Good Country views" and "Good"
You have to select 2 elements, so you should use SelectNodes instead of SelectSingleNode, if you want get the result at once.
var result = doc.DocumentNode.SelectNodes("//ul[#id='features']/li/*//td[last()]")
.Select(td => td.InnerText)
.ToList();
I think the problem about your XPath is that you should add brackets around the expression:
var text = doc.DocumentNode
.SelectSingleNode("(//ul[#id='features']/li/table/tr/td)[3]");
You can also try using LINQ:
var td = doc.Descendants("ul")
.First(x => x.GetAttributeValue("id","") == "features")
.Descendants("td")
.Skip(2)
.First();
var text = td.InnerText;

html agility how to process table in a hyperlink

I am working to get some information from a html table which has many rows like this. The given row is like one piece of info in a table cell. I need to get link, artist name, artist type from this table.
<a href="http://somesite/music/view_album.php?albumid=6468" style="color:#000;" sl-processed="1">
<table width="100%" border="0" bgcolor="#FFFFFF">
<tbody><tr>
<td colspan="2" align="left" valign="top" style="color:#900;">album title</td>
</tr>
<tr> <td width="31%" align="left" valign="top"> <img src="./albums_files/No_cover.png" width="90" height="80" border="0">
</td>
<td width="69%" align="left" valign="top">
<a class="leftcat" href="http://somelink/toartiset" sl-processed="1"> <strong>Rizwan-Muazzam</strong>
</a>
<br>
(<a class="leftcat" href="http://linktoartisttype/" sl-processed="1">
Some Artist Type </a>) <br>
<span class="leftcat">
Rated +: 0<br>
Rated -: 0 </span>
</td>
</tr>
<tr> <td valign="top" align="center" colspan="2">
</td> </tr>
</tbody></table>
</a>
I have done this
HtmlDocument doc = new HtmlDocument();
doc = new HtmlWeb().Load(albumUrl);
var nodes = doc.DocumentNode.SelectNodes("//a[#href]");
this gives me all the links which I need, now I want to get all the child information under the hyperlink.
Help will be appreciated.
Regards
Parminder
I would suggest using a loop to go through all the rows and then select the links and extract the info from them:
var rows = doc.DocumentNode.SelectNodes("//tr");
foreach (var row in rows)
{
var links = row.SelectNodes(".//a");
var artistLink = links[0].Attributes["href"];
var artistName = links[0].SelectSingleNode(".//strong/text()").InnerText;
var artistTypeLink = links[1].Attributes["href"];
var artistTypeName = links[1].SelectSingleNode(".//text()").InnerText;
// Store the results...
}

Save HTML content of some specific tags using HtmlAgilityPack

I have a question of how can I use HTMLAgilityPack for extracting HTML content of some specific tag.
I provide the following HTML code:
<td class="text_11" width="80%" valign="top">
<span class="text_11">Producer:</span>
<a class="link_11b" href="/Asus_producer.htm">Asus</a>
<br>
<p>
<table class="text_11" border="0" style="width: 100%;">
<tbody>
</table>
<p> </p>
</td>
What actually I need to do is to extract the HTML code (as it appears here) that is contained by <td class="text_11" width="80%" valign="top">, even there are more and more tags inside of this <td>.
What I need is to save into a string this text (which is HTML) :
`
<span class="text_11">Producer:</span>
<a class="link_11b" href="/Asus_producer.htm">Asus</a>
<br>
<p>
<table class="text_11" border="0" style="width: 100%;">
<tbody>
</table>
<p> </p>
`
Thanks.
var xpath = "//td[#class='text_11' and #width='80%' and #valign='top']";
var td = doc.DocumentNode.SelectSingleNode(xpath);
string html = td.InnerHtml; // check also if td is not null

Categories