Is it possible to remove the whole div with a specific class name? For example;
<body>
<div class="head">...</div>
<div class="container">...</div>
<div class="foot">...</div>
</body>
I would like to remove the div with the "container" class.
A C# code example would be verry useful, thank you.
The proper way (I suppose) to do this is via built in Gecko DOM classes and methods.
So, in your case something like:
var containers = yourDocument.GetElementsByClassName("container");
//this returns an IEnumerable of elements with this class. If you only ever gonna have one, you can do it like that:
var yourContainer = containers.FirstOrDefault();
yourContainer.Parent.RemoveChild(yourContainer);
Obviously, you can also do loops etc.
If you want to parse html in c# the best way is to use Html agility pack :
https://htmlagilitypack.codeplex.com/
HtmlDocument document = new HtmlDocument();
document.Load(#"C:\yourfile.html")
HtmlNode nodesToRemove= document .DocumentNode.SelectNodes("//div[#class='container']").ToList();
foreach (var node in nodesToRemove)
node.Remove();
Well, with the help of regex, you can remove your desired div
var data = "<body>\n<div class=\"head\">...</div>\n" +
"<div class=\"container\">...</div>\n" +
"<div class=\"foot\">...</div>\n</body>";
var rxStr = "<div[^<]+class=([\"'])container\\1.*</div>";
var rx = new System.Text.RegularExpressions.Regex (rxStr,
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
var nStr = rx.Replace (data, "");
Console.WriteLine (nStr);
This will reduce your string to
<body>
<div class="head">...</div>
<div class="foot">...</div>
</body>
Related
I'm trying during some hour with regex to take text inside some html tag:
<div class="ewok-rater-header-section">
<ul class="header">
<li><h1>meow</h1></li>
<li><h1>meow2</h1></li>
<li><h1>Time = <span class="work-weight">9.0 minutes</span></h1></li>
</ul>
</div>
i take meow with
var regexpost = new System.Text.RegularExpressions.Regex(#"<h1(.*?)>(.*?)</h1>");
var mpost = regexpost.Match(reqpost);
string lechat = (mpost.Groups[2].Value).ToString();
but not other
I like to add meow in a textbox , meow2 in a second textbox and 9.0 (minutes) in a last one
In these situations a Html parser can help a lot, and can also be a lot more precise and robust
Html Agility pack
Example
var html = #"<div class=""ewok-rater-header-section"">
<li><h1>meow</h1></li>
<li><h1>meow2</h1></li>
<li><h1>Time = <span class=""work-weight"">9.0 minutes</span></h1></li>
</div>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
// you can search for the heading
foreach (var node in doc.DocumentNode.SelectNodes("//li//h1"))
{
Console.WriteLine("Found heading : " + node.InnerText);
}
// or you can be more specific
var someSpan = doc.DocumentNode
.SelectNodes("//span[#class='work-weight']")
.FirstOrDefault();
Console.WriteLine("Found span : " + someSpan.InnerText);
Output
Found heading : meow
Found heading : meow2
Found heading : Time = 9.0 minutes
Found span : 9.0 minutes
Demo here
it s for parse http reponse. Then is it not slow to use a html parser to create document ?
I have an HTML that I download via my webrequest client. And out of entire html I want to parse only this part of HTML:
<span class="sku">
<span class="fb">SKU :</span>118880101
</span>
I'm using HTML agilty pack to retrieve this value: 118880101
And I've written something like this:
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
return htmlDoc.DocumentNode.SelectNodes("//span[#class='sku']").ElementAt(0).InnerText;
And this returns me this value from HTML:
SKU :118880101
Literally like this, spaces included... How can I fix this logic with HTML Agilty pack so that I can only take out this 118880101 value?
Can someone help me out?
Edit: a regex like this would do the thing:
Substring(skuRaw.LastIndexOf(':') + 1);
which would mean to take everything after ":' sign in string that I receive... But I'm not sure if it's safe to use regex like this ?
Try This
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var innerText=htmlDoc.DocumentNode.SelectNodes("//span[#class='sku']")
.ElementAt(0).InnerText;
return innerText.replace(/\D/g,'');
if you want to use only Html Agility pack try this
var child = htmlDoc.DocumentNode.SelectNodes("//span[#class='fb']")
.FirstOrDefault();
if (child != null)
{
var parent = child.ParentNode;
parent.RemoveChild(child);
var innerText = parent.InnerText;
}
I've got this code repeated in a div tag and want to write an XPath expression to find the dsd link so that I can click on it, based on the text in the h4 tag. Changing the HTML isn't an option.
<div>
<h4>Test Block</h4>
<br/>
<div>
Option 1
Option 2
</div>
</div>
At the moment, I'm trying something like, where name is the name of the h4 tag;
var findSubmitButton = Driver.FindElement(By.XPath("//div/h4[contains(text(), '" + name + "')]"));
var submitButton = findSubmitButton.FindElement(By.XPath("../div/a[contains(#href,'dsd')]"));
submitButton.Click();
But I'm unable to get this to work. Any suggestions would be gratefully received.
I do not see an issue with your xpaths. The HTML you supplied is invalid due to your placeholders, but your xpaths appear to work with this:
void Main()
{
var xml = #"
<div>
<h4>Test Block</h4>
<br/>
<div>
Option 1
Option 2
</div>
</div>";
var xmldoc = new XmlDocument();
xmldoc.LoadXml(xml);
var node = xmldoc.DocumentElement.SelectSingleNode("//div/h4[contains(text(),'Test Block')]");
node = node.SelectSingleNode("../div/a[contains(#href,'dsd')]");
Console.WriteLine(node.InnerText);
}
I don't have a working machine so I can't test this, but you said any feedback would be well received, so, I'm pretty sure using XPath you can grab individual elements from a child. If you know for sure that this HTML will always be the same, you could do:
../div[0] //(First element of the child)
You could use //div[h4[contains(., 'Test Block')]]//a[contains(#href, 'dsd')]. Also something like //div[h4[contains(., 'Test Block')]]//a[contains(., 'Option 1')] should work.
why don't you use the following-sibling
var findSubmitButton = Driver.FindElement(By.XPath("//div/h4[contains(text(), '" + name + "')]"));
var submitButton = findSubmitButton.FindElement(By.XPath("following-sibling::div/a[contains(#href,'dsd')]"));
I am currently working with an XML document which has RSS feeds inside. And I wanted to parse it so that if a div tag with a class name "feedflare" is found, the code would remove the whole DIV.
I could not find an example of doing this as the search for it is polluted with "HTML editor errors" and other irrelevant data.
Would anyone here be kind enough to share methods in reaching my goal?
I must state that I DO NOT want to use HtmlAgilityPack if I can avoid it.
This is my process:
Load XML, parse through elements and pick out, Title, Description, Link.
Then save all this as HTML (with tags being added programatically to build a web page) and then when all of the tags are added, I want to parse the resulting "HTML text" and remove the annoying DIV tag.
Let's assume "string HTML = textBox1.text" where textBox1 is where the resulting HTML is pasted, after parsing the main XML document.
How would I then loop through the contents of textBox1.text and remove ONLY the div tag called "feedflare" (see below).
<div class="feedflare">
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:yIl2AUoC8zA">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?d=yIl2AUoC8zA" border="0"></img></a>
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:H0mrP-F8Qgo">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?d=H0mrP-F8Qgo" border="0"></img></a>
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:D7DqB2pKExk">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?i=lB-zYAGjzDU:1zqeSgzxt90:D7DqB2pKExk" border="0"></img></a>
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:V_sGLiPBpWU">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?i=lB-zYAGjzDU:1zqeSgzxt90:V_sGLiPBpWU" border="0"></img></a>
</div>
Thank you in advance.
Using this xml library, do:
XElement root = XElement.Load(file); // or .Parse(string);
XElement div = root.XPathElement("//div[#class={0}]", "feedflare");
div.Remove();
root.Save(file); // or string = root.ToString();
try this
System.Xml.XmlDocument d = new System.Xml.XmlDocument();
d.LoadXml(Your_XML_as_String);
foreach(System.Xml.XmlNode n in d.GetElementsByTagName("div"))
d.RemoveChild(n);
and use d.OuterXml to retrieve the new xml.
My solution in Javascript is:
function unrichText(texto) {
var n = texto.indexOf("\">"); //Finding end of "<div class="ExternalClass...">
var sub = texto.substring(0, n+2); //Adding first char and last two (">)
var tmp = texto.replace(sub, ""); //Removing it
tmp = replaceAll(tmp, "</div>", ""); //Removing last "div"
tmp = replaceAll(tmp, "<p>", ""); //Removing other stuff
tmp = replaceAll(tmp, "</p>", "");
tmp = replaceAll(tmp, " ", "");
return (tmp);
}
function replaceAll(str, find, replace) {
return str.replace(new RegExp(find, 'g'), replace);
}
How to replace <b></b> tag with <strong></strong> tag to a specific div?
ex:
<div id="aaa">hello<b>wow</b>!</div>
using javascript to replace with
<div id="aaa">hello<strong>wow</strong>!</div>
please help! thanks in advance.
***** Why I'm try to do is change the output HTML code <b></b> to <strong></strong> , in order to get W3C validation. Can I do that? **
Or Is there any solution that can use ASP.NET+C# to do that?
Here you go:
var root, elems;
root = document.getElementById( 'test' );
elems = root.getElementsByTagName( 'b' );
toArray( elems ).forEach( function ( elem ) {
var newElem = document.createElement( 'strong' );
newElem.textContent = elem.textContent;
elem.parentNode.replaceChild( newElem, elem );
});
where toArray is your preferred array-like to array converter function. I use this one:
function toArray( arrayLike ) { return [].slice.call( arrayLike ); }
Live demo: http://jsfiddle.net/mJSyH/3/
Note: this code doesn't work in IE8.
You can grab all <b> elements under a certain element, move all child nodes to a new <strong> element, and then replace the <b> with the <strong>.
<div id="aaa">hello<b>wow</b><b>2</b><b>3</b>!</div>
<script>
var container = document.getElementById("aaa")
var find = container.getElementsByTagName("b");
var bold, strong;
while (bold = find[0]) {
strong = document.createElement("strong");
while (bold.firstChild) {
strong.appendChild(bold.firstChild);
}
bold.parentNode.replaceChild(strong, bold);
}
</script>
The reason you can set bold = find[0] every time is that as the <b> elements are removed from the document, they are also removed from the NodeList find.
See the latest version at http://jsbin.com/eqikaj/13/edit.
Using jQuery you can find all b tags in scope of your parent div container element and then replace each of them with strong and copy inner text of the source tag:
$('#aaa b').each(function() {
$(this).replaceWith($('<strong>' + this.html() + '</strong>');
});
If you use jQuery, you can simply go like this:
$('b').replaceWith(function() {
return $('<strong>').html($(this).html());
});
Just download or include the jQuery library somehow, and you can use the snippet.
http://docs.jquery.com/Downloading_jQuery
A solution using regular expressions:
var e = document.getElementById("aaa");
e.innerHTML = e.innerHTML.replace(/<b[^>]*>(.*?)<\/b>/ig, '<strong>$1</strong>');
perhaps is not more fast that the versions above.
but the perfomace difference is very little(irrelevant in real applications).
Use whichever you think best
note: you don't need use a function as toArray, you can do this:
Array.forEach(elems, function() { ... })