I have:
<h2 class="entry-title" itemprop="headline">
<a href="http://www.printesaurbana.ro/2015/10/idei-despre-un-start-bun-in-blogging.html" Idei despre un start bun în blogging </a>
</h2>
I want get href using class name, I try:
if (webBrowser1.Document != null)
{
var links = webBrowser1.Document.GetElementsByTagName("a");
foreach (HtmlElement link in links)
{
if (link.GetAttribute("class") == "entry-title")
{
MessageBox.Show("Here");
}
}
}
But didn't work. How solve this?
You should use link.GetAtribute("className"). Besides, it is the h2 tag in your html document that has the entry-title class. Corrected code:
if (webBrowser1.Document != null)
{
var links = webBrowser1.Document.GetElementsByTagName("h2");
foreach (HtmlElement link in links)
{
if (link.GetAttribute("className") == "entry-title")
{
MessageBox.Show("Here");
}
}
}
Related
I am coding in C# using WindowsForms.
I am trying to iterate through a list of ID's that changes the url of a webbrowser control.
ClientID is an List<int> ClientID = new List<int>();
And filled with around 10-15 different numbers.
The foreach goes too quickly for the webbrowser control, and because of that the ErrorDiv is always null. The site isn't able to load, thus I am not able to check for the div with the specified class. If this class does exist, the foreach has to contiue with the next Client.
foreach (int Client in ClientID)
{
if(webBrowser1.ReadyState == WebBrowserReadyState.Complete)
{
webBrowser1.Navigate(URLconsult + "/" + Client);
}
var ErrorDiv = webBrowser1.Document
.GetElementsByTagName("div")
.Cast<HtmlElement>()
.FirstOrDefault(m => m.GetAttribute("className") == "incompleteConsultNotification");
Console.WriteLine(webBrowser1.Document.GetElementsByTagName("div").Cast<HtmlElement>().FirstOrDefault(m => m.GetAttribute("className") == "incompleteConsultNotification"));
//if (ErrorDiv == null)
//{
// Console.WriteLine("Normal");
//}
//else
//{
// Error.Add(Client);
// continue;
//}
}
The HTML div I try to target:
<div class="incompleteConsultNotification">
This form is incomplete. Please add the following:
<ul>
<li> option 1</li>
<li> option 2</li>
</ul>
</div>
I need help because I am not really used to work with HTML. I show a webdocument from my code, the web document read an HTML file, containing some Images.
Everytime, just before the Image tag, I observed two tags who create some wrong caracters. An example would be better.
<p ><br clear=all> </span>
<img border=0 width=265 height=105 id="Picture 84856"
src="Test_HTML/image272.jpg"></p>
the printing is partially correct because it shows the Images and a lots of wrong ÂÂÂÂÂÂÂÂÂ characters.
So I decided to try to cut the tags.
I don't know how to do this. Perhaps I am completely wrong but I think it is good start, isn't it?
My test to suppress these tags in a Html node is
public void ShowTag(string tag)
{
string innerHtml= "//div[#id='"+tag+ "']";
string inner = "//p";
string brToRemove = "//br";
string spanToRemove = "//span";
var nodes = document.DocumentNode.SelectSingleNode(innerHtml);
bool br_deleted = false;
foreach (HtmlNode nd in nodes.SelectNodes(inner))
{
foreach (HtmlNode child in nd.ChildNodes)
{
if (child.Name == "br")
{
int a = 0;
a++;
child.ParentNode.RemoveChild(child);
br_deleted = true;
}
if(child.Name=="span")
{
int b = 0;
b++;
if (br_deleted == true)
{
//nd.ParentNode.RemoveChild(child);
child.Remove();
br_deleted = false;
}
}
}
}
but I cannot remove the child, do you have any idea?
I founded where the problem came from: When selecting the good node, I needed to add the Headers so i could identify the encoding.
string innerHtml = "//div[#id='" + tag + "']";
string inner = "//p";
webbrowser.Navigate("about:blank");
LoadDocument();
HtmlNode nodes = document.DocumentNode.SelectSingleNode(innerHtml);
HtmlNode head = document.DocumentNode.SelectSingleNode("/html/head");
head.AppendChild(nodes);
webbrowser.NavigateToString(head.InnerHtml);
Let say this is my html code
<a class="" data-tracking-id="0_Motorola"
href="/motorola?otracker=nmenu_sub_electronics_0_Motorola">
Motorola
</a>
I used C# code to find the href value like this
var tags = htmlDoc.DocumentNode.SelectNodes("//div[#class='top-menu unit']
//ul//li//div[#id='submenu_electronics']//a");
if (tags != null)
{
foreach (var t in tags)
{
var name = t.InnerText.Trim();
var url =t.Attributes["href"].Value;
}
}
I am getting url='/motorola' but I need url=/motorola?otracker=nmenu_sub_electronics_0_Motorola
its not appending text after ?,&.. Please clarify where I went wrong.
I have used HtmlAgilityPack in the past and I have previously used it like this :
var url = t.GetAttributeValue("href","");
You can try that and see if it works.
I am using HTML Agility Pack and searching for div with class="fileHeader" that has "RelayClinical Patient Education with Animations Install zip" in a child h4 element. Once found, I want to capture the "href" attribute inside the anchor tag of that particular block. How can I get it?
HTML Source
<div class="fileHeader" id="fileHeader_7311111">
<h4 class="collapsed">RelayClinical Patient Education with Animations Install zip</h4>
<div class="defaultMethod">
<a class="buttonGrey" href="https://mckc-esd.subscribenet.com/cgi-bin/download?rid=2511740931&rp=DTM20130905162949MzcyODIwNjM0" title="Clicking this link will open a new window." rel="noreferrer">
HTTPS Download
</a>
</div>
</div>
Code
HtmlNodeCollection fileHeaderNodes = bodyNode.SelectNodes("//div[#class='fileHeader']//h4");
foreach (HtmlNode fileHeader in fileHeaderNodes)
{
if (fileHeader.InnerText.Trim() == "RelayClinical Patient Education with Animations Install zip")
{
HtmlNodeCollection fileHeaderNodes = bodyNode.SelectNodes("//div[#class='fileHeader']//h4");
foreach (HtmlNode fileHeader in fileHeaderNodes)
{
if (fileHeader.InnerText.Trim() == "RelayClinical Patient Education with Animations Install zip")
{
foreach (HtmlNode link in fileHeader.SelectNodes("//a[#href]"))
{
// extract the link and put in dataUrl var
if ((link.InnerText.Trim() == "HTTPS Download") && isFound == true)
{
count++;
// select all a tags (html anchor tags) that have a href attribute
HtmlAttribute att = link.Attributes["href"];
dataUrl = att.Value;
}
}
}
}
}
}
Rather than selecting the h4 element, select the a element directly. Then you can grab the href attribute.
var h4Text = "RelayClinical Patient Education with Animations Install zip";
var xpath = String.Format(
"//div[#class='fileHeader' and h4='{0}']/div[#class='defaultMethod']/a",
h4Text
);
var anchor = doc.DocumentNode.SelectSingleNode(xpath);
if (anchor != null)
{
var attr = anchor.GetAttributeValue("href", null);
// do stuff with attr
}
I'm having difficulty trying to remove a div with a particular ID, and its children using the HTML Agility pack. I am sure I'm just missing a config option, but its Friday and I'm struggling.
The simplified HTML runs:
<html><head></head><body><div id='wrapper'><div id='functionBar'><div id='search'></div></div></div></body></html>
This is as far as I have got. The error thrown by the agility pack shows it cannot find a div structure:
<div id='functionBar'></div>
Here's the code so far (taken from Stackoverflow....)
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
// There are various options, set as needed
//htmlDoc.OptionFixNestedTags = true;
// filePath is a path to a file containing the html
htmlDoc.LoadHtml(Html);
string output = string.Empty;
// ParseErrors is an ArrayList containing any errors from the Load statement
if (htmlDoc.ParseErrors != null && htmlDoc.ParseErrors.Count > 0)
{
// Handle any parse errors as required
}
else
{
if (htmlDoc.DocumentNode != null)
{
HtmlAgilityPack.HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("//body");
if (bodyNode != null)
{
HtmlAgilityPack.HtmlNode functionBarNode = bodyNode.SelectSingleNode ("//div[#id='functionBar']");
bodyNode.RemoveChild(functionBarNode,false);
output = bodyNode.InnerHtml;
}
}
}
bodyNode.RemoveChild(functionBarNode,false);
But functionBarNode is not a child of bodyNode.
How about functionBarNode.ParentNode.RemoveChild(functionBarNode, false)? (And forget the bit about finding bodyNode.)
You can simply call:
var documentNode = document.DocumentNode;
var functionBarNode = documentNode.SelectSingleNode("//div[#id='functionBar']");
functionBarNode.Remove();
It is much simpler, and does the same as:
functionBarNode.ParentNode.RemoveChild(functionBarNode, false);
This will work for multiple:
HtmlDocument d = this.Download(string.Format(validatorUrl, Url));
foreach (var toGo in QuerySelectorAll(d.DocumentNode, "p[class=helpwanted]").ToList())
{
toGo.Remove();
}