HTML Agility pack with c# - c#

c# code:`
var node = new HtmlWeb();
var doc = node.Load("http://ask.fm/");
HtmlNode ournode = doc.DocumentNode.SelectSingleNode("//div[#id='heads']")
textBox1.Text=ournode.InnerHtml;
`
html code :
//< div id="heads" >
<img alt="" class="head" id="face_30132803" src="http://img3.ask.fm/assets2/103/548/655/872/thumb_tiny/IMG_20150513_192250.jpg" />
<img alt="" class="head" id="face_56578735" src="http://img1.ask.fm/assets2/091/364/883/712/thumb_tiny/11094711_919135961470973_149663457_njpg720960png1280963.png" />
I want to see the following in the text box
/sudenur3434
/leylaulucay

I have added an additional line to your code:
var node = new HtmlWeb();
var doc = node.Load("http://ask.fm/");
HtmlNode ournode = doc.DocumentNode.SelectSingleNode("//div[#id='heads']")
var val = ournode.Attributes["href"].Value;
textBox1.Text=val;
This would let you get the href attribute. Simply use the same code to get the other nodes href value and then add them to your textbox

Since a text box is usually used for one liners, I am giving you an example that will simply write all links in the direct output window of VS.
If you use e.g. a ListBox instead of a text box you can replace Debug.Print by e.g. ListBox1.Items.Add(href.Value)
This here will give you all href urls from all a children in div id="heads":
var site = new HtmlWeb();
var htmldoc = site.Load("http://ask.fm/");
var headDiv = htmldoc.DocumentNode.SelectSingleNode("//div[#id='heads']");
if (headDiv != null)
{
var anchors = headDiv.SelectNodes("a");
foreach (HtmlNode aNode in anchors)
{
var href = aNode.Attributes.AttributesWithName("href").FirstOrDefault();
if (href != null)
Debug.Print(href.Value);
}
}

< div id="heads" >
<img alt="" class="head" id="face_30132803" src="http://img3.ask.fm/assets2/103/548/655/872/thumb_tiny/IMG_20150513_192250.jpg" />
<a href="/leylaulucay" data-rlt-aid="welcome_head"><img alt="" class="head" id="face_5
how to agility pack parse in textbox

Related

Get Custom Html tag using HTML Agility Pack c#

Given I have the following HTML, how can I use HTML Agility Pack to get the value 18.2099
I had a look at the following
How get a custom tag with html agility pack? but that did not work.
<div class="last-price">
<div class="last u-up">
<bdo class="last-price-value js-streamable-element" dir="ltr" data-s="100">18.2099</bdo>
</div>
</div>
Current Code
var htmlText = await response.Content.ReadAsStringAsync(cancellationToken);
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(htmlText);
var htmlTestNodes = htmlDoc.DocumentNode.SelectNodes("//dbo"); <-- This returns null
var htmlNode = htmlDoc.DocumentNode.SelectSingleNode("//div[#class='last u-up']/dbo[#class='last-price-value']");
if (decimal.TryParse(htmlNode.InnerText, out var value))
{
return value;
}
this might work
var htmlNode = document.DocumentNode.SelectSingleNode("/html/body/div/div/bdo");
if (decimal.TryParse(htmlNode.InnerText, out var value))
{
Console.WriteLine(value);
}
Or
var htmlNode = document.DocumentNode.SelectSingleNode("//bdo[#class='last-price-value js-streamable-element']");
if (decimal.TryParse(htmlNode.InnerText, out var value))
{
Console.WriteLine(value);
}

Error "Object reference not set to an instance of an object"

This code can work with one of the web, but with some sites it back error messages like this, I do not know how to edit (Error in stars)
var document = webBrowser1.Document;
var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)document.DomDocument;
var htmlString = documentAsIHtmlDocument3.documentElement.innerHTML;
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlString);
// Sử dụng node để lấy tin
HtmlNodeCollection texts = doc.DocumentNode.SelectNodes("//div[#id='footer']/p");
string kq = "";
// cho vòng lặp để lấy kết quả
foreach (var item in texts)
{
kq += item.InnerText + Environment.NewLine;
}
richTextBox1.Text = kq;
HTML code:
<div id="divTop" >
<div id="text-conent" style="width: 500px; float: right;"></div>
<div id="grid" style="margin-removed 505px; height: 700px;"></div>
</div>
It seems that on the pages where this is successful there exists a div with the id of footer
But on other pages where this fails no such div exists.
So it seems like your logic may need to change to make the search expression that doc.DocumentNode.SelectNodes more forgiving.
Alternatively create a few more search strings that would work if your original fails:
if(texts == null){
texts = doc.DocumentNode.SelectNodes("some other search string");
}
etc.

Xpath for <a></a> tag

Let say this is my html code
<a class="" data-tracking-id="0_Motorola"
href="/motorola?otracker=nmenu_sub_electronics_0_Motorola">
Motorola
</a>
I used C# code to find the href value like this
var tags = htmlDoc.DocumentNode.SelectNodes("//div[#class='top-menu unit']
//ul//li//div[#id='submenu_electronics']//a");
if (tags != null)
{
foreach (var t in tags)
{
var name = t.InnerText.Trim();
var url =t.Attributes["href"].Value;
}
}
I am getting url='/motorola' but I need url=/motorola?otracker=nmenu_sub_electronics_0_Motorola
its not appending text after ?,&.. Please clarify where I went wrong.
I have used HtmlAgilityPack in the past and I have previously used it like this :
var url = t.GetAttributeValue("href","");
You can try that and see if it works.

Selecting the next element using HTML Agility Pack

I am using HTML Agility Pack and searching for div with class="fileHeader" that has "RelayClinical Patient Education with Animations Install zip" in a child h4 element. Once found, I want to capture the "href" attribute inside the anchor tag of that particular block. How can I get it?
HTML Source
<div class="fileHeader" id="fileHeader_7311111">
<h4 class="collapsed">RelayClinical Patient Education with Animations Install zip</h4>
<div class="defaultMethod">
<a class="buttonGrey" href="https://mckc-esd.subscribenet.com/cgi-bin/download?rid=2511740931&rp=DTM20130905162949MzcyODIwNjM0" title="Clicking this link will open a new window." rel="noreferrer">
HTTPS Download
</a>
</div>
</div>
Code
HtmlNodeCollection fileHeaderNodes = bodyNode.SelectNodes("//div[#class='fileHeader']//h4");
foreach (HtmlNode fileHeader in fileHeaderNodes)
{
if (fileHeader.InnerText.Trim() == "RelayClinical Patient Education with Animations Install zip")
{
HtmlNodeCollection fileHeaderNodes = bodyNode.SelectNodes("//div[#class='fileHeader']//h4");
foreach (HtmlNode fileHeader in fileHeaderNodes)
{
if (fileHeader.InnerText.Trim() == "RelayClinical Patient Education with Animations Install zip")
{
foreach (HtmlNode link in fileHeader.SelectNodes("//a[#href]"))
{
// extract the link and put in dataUrl var
if ((link.InnerText.Trim() == "HTTPS Download") && isFound == true)
{
count++;
// select all a tags (html anchor tags) that have a href attribute
HtmlAttribute att = link.Attributes["href"];
dataUrl = att.Value;
}
}
}
}
}
}
Rather than selecting the h4 element, select the a element directly. Then you can grab the href attribute.
var h4Text = "RelayClinical Patient Education with Animations Install zip";
var xpath = String.Format(
"//div[#class='fileHeader' and h4='{0}']/div[#class='defaultMethod']/a",
h4Text
);
var anchor = doc.DocumentNode.SelectSingleNode(xpath);
if (anchor != null)
{
var attr = anchor.GetAttributeValue("href", null);
// do stuff with attr
}

How to get content via xpath

On the web page i have
<meta name="description" content="Learn about 94.100.179.159" />
how can i get exactly the text "Learn about 94.100.179.159" via Xpath or HtmlAgilityPack
i've tried
HtmlWeb hwObject = new HtmlWeb();
HtmlDocument htmldocObject = hwObject.Load("http://whois.domaintools.com/94.100.179.159");
foreach (HtmlNode link in htmldocObject.DocumentNode.SelectNodes("//meta"))
{
string s = link.InnerText;
Console.WriteLine(s);
}
Console.ReadLine();
but that gives me not that i want, how to solve that?
//meta[#name = 'description']/#content
is the XPATH for the attribute you specified
string s = link.Value;
should return the attribute content.
Meta tags don't have any inner text, they have attributes.
Try this:
HtmlWeb hwObject = new HtmlWeb();
HtmlDocument htmldocObject = hwObject.Load("http://whois.domaintools.com/94.100.179.159");
foreach (HtmlNode link in htmldocObject.DocumentNode.SelectNodes("//meta"))
{
Console.WriteLine("-META-");
var attribDump=link.Attributes.Select(a=>a.Name+" : "+a.Value);
foreach (var x in attribDump)
{
Console.WriteLine(x);
}
}
Select the nodes as follows
SelectNodes("//*[local-name()='meta')]"))
Then, for each HtmlNode,
Console.WriteLine(link.Attributes["content"].Value);

Categories