Need an XPath expressions to locate based on a sibling - c#

I've got this code repeated in a div tag and want to write an XPath expression to find the dsd link so that I can click on it, based on the text in the h4 tag. Changing the HTML isn't an option.
<div>
<h4>Test Block</h4>
<br/>
<div>
Option 1
Option 2
</div>
</div>
At the moment, I'm trying something like, where name is the name of the h4 tag;
var findSubmitButton = Driver.FindElement(By.XPath("//div/h4[contains(text(), '" + name + "')]"));
var submitButton = findSubmitButton.FindElement(By.XPath("../div/a[contains(#href,'dsd')]"));
submitButton.Click();
But I'm unable to get this to work. Any suggestions would be gratefully received.

I do not see an issue with your xpaths. The HTML you supplied is invalid due to your placeholders, but your xpaths appear to work with this:
void Main()
{
var xml = #"
<div>
<h4>Test Block</h4>
<br/>
<div>
Option 1
Option 2
</div>
</div>";
var xmldoc = new XmlDocument();
xmldoc.LoadXml(xml);
var node = xmldoc.DocumentElement.SelectSingleNode("//div/h4[contains(text(),'Test Block')]");
node = node.SelectSingleNode("../div/a[contains(#href,'dsd')]");
Console.WriteLine(node.InnerText);
}

I don't have a working machine so I can't test this, but you said any feedback would be well received, so, I'm pretty sure using XPath you can grab individual elements from a child. If you know for sure that this HTML will always be the same, you could do:
../div[0] //(First element of the child)

You could use //div[h4[contains(., 'Test Block')]]//a[contains(#href, 'dsd')]. Also something like //div[h4[contains(., 'Test Block')]]//a[contains(., 'Option 1')] should work.

why don't you use the following-sibling
var findSubmitButton = Driver.FindElement(By.XPath("//div/h4[contains(text(), '" + name + "')]"));
var submitButton = findSubmitButton.FindElement(By.XPath("following-sibling::div/a[contains(#href,'dsd')]"));

Related

Remove whole div with specific class name

Is it possible to remove the whole div with a specific class name? For example;
<body>
<div class="head">...</div>
<div class="container">...</div>
<div class="foot">...</div>
</body>
I would like to remove the div with the "container" class.
A C# code example would be verry useful, thank you.
The proper way (I suppose) to do this is via built in Gecko DOM classes and methods.
So, in your case something like:
var containers = yourDocument.GetElementsByClassName("container");
//this returns an IEnumerable of elements with this class. If you only ever gonna have one, you can do it like that:
var yourContainer = containers.FirstOrDefault();
yourContainer.Parent.RemoveChild(yourContainer);
Obviously, you can also do loops etc.
If you want to parse html in c# the best way is to use Html agility pack :
https://htmlagilitypack.codeplex.com/
HtmlDocument document = new HtmlDocument();
document.Load(#"C:\yourfile.html")
HtmlNode nodesToRemove= document .DocumentNode.SelectNodes("//div[#class='container']").ToList();
foreach (var node in nodesToRemove)
node.Remove();
Well, with the help of regex, you can remove your desired div
var data = "<body>\n<div class=\"head\">...</div>\n" +
"<div class=\"container\">...</div>\n" +
"<div class=\"foot\">...</div>\n</body>";
var rxStr = "<div[^<]+class=([\"'])container\\1.*</div>";
var rx = new System.Text.RegularExpressions.Regex (rxStr,
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
var nStr = rx.Replace (data, "");
Console.WriteLine (nStr);
This will reduce your string to
<body>
<div class="head">...</div>
<div class="foot">...</div>
</body>

HtmlAgilityPack Get all links inside a DIV

I want to be able to get 2 links from inside a div.
Currently I can select one but whene there's more it doesn't seem to work.
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[#class='myclass']");
if (node != null)
{
foreach (HtmlNode type in node.SelectNodes("//a#href"))
{
recipe.type += type.InnerText;
}
}
else
recipe.type = "Error fetching type.";
Trying to get it from this piece of HTML:
<div class="myclass">
<h3>Not Relevant Header</h3>
This text,
and this text
</div>
Any help is appreciated, Thanks in advance.
var div = doc.DocumentNode.SelectSingleNode("//div[#class='myclass']");
if(div!=null)
{
var links = div.Descendants("a")
.Select(a => a.InnerText)
.ToList();
}
Use this XPath:
//div[#class = 'myclass']//a
It grabs all descendant a elements in div with class = 'myclass'.
And //a#href is incorrect XPath.
Use:
//div[contains(concat(' ', #class, ' '), ' myclass ')]//a
This selects any a element that is a descendant of any div whose class attribute contains a classname of "myclass".
The classname may be single, or the attribute may also contain other classnames. In this case the classname may be the starting one, or the last one or may be surrounded by other classnames -- the above XPath expression correctly selects the wanted nodes in all of these different cases.

Removing DIV from a text file if it contains a certain classname

I am currently working with an XML document which has RSS feeds inside. And I wanted to parse it so that if a div tag with a class name "feedflare" is found, the code would remove the whole DIV.
I could not find an example of doing this as the search for it is polluted with "HTML editor errors" and other irrelevant data.
Would anyone here be kind enough to share methods in reaching my goal?
I must state that I DO NOT want to use HtmlAgilityPack if I can avoid it.
This is my process:
Load XML, parse through elements and pick out, Title, Description, Link.
Then save all this as HTML (with tags being added programatically to build a web page) and then when all of the tags are added, I want to parse the resulting "HTML text" and remove the annoying DIV tag.
Let's assume "string HTML = textBox1.text" where textBox1 is where the resulting HTML is pasted, after parsing the main XML document.
How would I then loop through the contents of textBox1.text and remove ONLY the div tag called "feedflare" (see below).
<div class="feedflare">
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:yIl2AUoC8zA">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?d=yIl2AUoC8zA" border="0"></img></a>
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:H0mrP-F8Qgo">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?d=H0mrP-F8Qgo" border="0"></img></a>
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:D7DqB2pKExk">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?i=lB-zYAGjzDU:1zqeSgzxt90:D7DqB2pKExk" border="0"></img></a>
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:V_sGLiPBpWU">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?i=lB-zYAGjzDU:1zqeSgzxt90:V_sGLiPBpWU" border="0"></img></a>
</div>
Thank you in advance.
Using this xml library, do:
XElement root = XElement.Load(file); // or .Parse(string);
XElement div = root.XPathElement("//div[#class={0}]", "feedflare");
div.Remove();
root.Save(file); // or string = root.ToString();
try this
System.Xml.XmlDocument d = new System.Xml.XmlDocument();
d.LoadXml(Your_XML_as_String);
foreach(System.Xml.XmlNode n in d.GetElementsByTagName("div"))
d.RemoveChild(n);
and use d.OuterXml to retrieve the new xml.
My solution in Javascript is:
function unrichText(texto) {
var n = texto.indexOf("\">"); //Finding end of "<div class="ExternalClass...">
var sub = texto.substring(0, n+2); //Adding first char and last two (">)
var tmp = texto.replace(sub, ""); //Removing it
tmp = replaceAll(tmp, "</div>", ""); //Removing last "div"
tmp = replaceAll(tmp, "<p>", ""); //Removing other stuff
tmp = replaceAll(tmp, "</p>", "");
tmp = replaceAll(tmp, " ", "");
return (tmp);
}
function replaceAll(str, find, replace) {
return str.replace(new RegExp(find, 'g'), replace);
}

Using Html Agility Pack, Selecting the current element in a loop (XPATH)

I'm trying to do something simple, but somehow it doesnt work for me, here's my code:
var items = html.DocumentNode.SelectNodes("//div[#class='itembox']");
foreach(HtmlNode e in items)
{
int x = items.count; // equals 10
HtmlNode node = e;
var test = e.SelectNodes("//a[#class='head']");// I need this to return the
// anchor of the current itembox
// but instead it returns the
// anchor of each itembox element
int y =test.count; //also equals 10!! suppose to be only 1
}
my html page looks like this:
....
<div class="itembox">
<a Class="head" href="one.com">One</a>
</div>
<div class="itembox">
<a Class="head" href="two.com">Two</a>
</div>
<!-- 10 itembox elements-->
....
Is my XPath expression wrong? am i missing something?
Use
var test = e.SelectNodes(".//a[#class='head']");
instead. Your current code ( //a[]) searches all a elements starting from the root node. If you prefix it with a dot instead (.//a[]) only the descendants of the current node will be considered. Since it is a direct child in your case you could of course also do:
var test = e.SelectNodes("a[#class='head']");
As always see the Xpath spec for details.
var test = e.SelectNodes("//a[#class='head']");
This is an absolute expression, but you need a relative XPath expression -- to be evaluated off e.
Therefore use:
var test = e.SelectNodes("a[#class='head']");
Do note: Avoid using the XPath // pseudo-operator as much as possible, because such use may result in significant inefficiencies (slowdown).
In this particular XML document the a elements are just children of div -- not at undefinite depth off div.

How to change output HTML tag <b></b> to <strong></strong>?

How to replace <b></b> tag with <strong></strong> tag to a specific div?
ex:
<div id="aaa">hello<b>wow</b>!</div>
using javascript to replace with
<div id="aaa">hello<strong>wow</strong>!</div>
please help! thanks in advance.
***** Why I'm try to do is change the output HTML code <b></b> to <strong></strong> , in order to get W3C validation. Can I do that? **
Or Is there any solution that can use ASP.NET+C# to do that?
Here you go:
var root, elems;
root = document.getElementById( 'test' );
elems = root.getElementsByTagName( 'b' );
toArray( elems ).forEach( function ( elem ) {
var newElem = document.createElement( 'strong' );
newElem.textContent = elem.textContent;
elem.parentNode.replaceChild( newElem, elem );
});
where toArray is your preferred array-like to array converter function. I use this one:
function toArray( arrayLike ) { return [].slice.call( arrayLike ); }
Live demo: http://jsfiddle.net/mJSyH/3/
Note: this code doesn't work in IE8.
You can grab all <b> elements under a certain element, move all child nodes to a new <strong> element, and then replace the <b> with the <strong>.
<div id="aaa">hello<b>wow</b><b>2</b><b>3</b>!</div>
<script>
var container = document.getElementById("aaa")
var find = container.getElementsByTagName("b");
var bold, strong;
while (bold = find[0]) {
strong = document.createElement("strong");
while (bold.firstChild) {
strong.appendChild(bold.firstChild);
}
bold.parentNode.replaceChild(strong, bold);
}
</script>
The reason you can set bold = find[0] every time is that as the <b> elements are removed from the document, they are also removed from the NodeList find.
See the latest version at http://jsbin.com/eqikaj/13/edit.
Using jQuery you can find all b tags in scope of your parent div container element and then replace each of them with strong and copy inner text of the source tag:
$('#aaa b').each(function() {
$(this).replaceWith($('<strong>' + this.html() + '</strong>');
});
If you use jQuery, you can simply go like this:
$('b').replaceWith(function() {
return $('<strong>').html($(this).html());
});
Just download or include the jQuery library somehow, and you can use the snippet.
http://docs.jquery.com/Downloading_jQuery
A solution using regular expressions:
var e = document.getElementById("aaa");
e.innerHTML = e.innerHTML.replace(/<b[^>]*>(.*?)<\/b>/ig, '<strong>$1</strong>');
perhaps is not more fast that the versions above.
but the perfomace difference is very little(irrelevant in real applications).
Use whichever you think best
note: you don't need use a function as toArray, you can do this:
Array.forEach(elems, function() { ... })

Categories