problam get element value from dynamic URL Adress - c#

I want scrape element value from url adress by c# but dont work program.
I tested many codes but I could not get the element value
using HtmlAgilityPack;
var web = new HtmlWeb();
var doc = web.Load("http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=5516102131364383");
var element = doc.DocumentNode.SelectSingleNode("//div[contains(#class,'ltr inline')]");
MessageBox.Show(element.InnerText);

Related

How do I access the title attribute with xpath?

So I am trying to access the title attribute in the colors section.
So if you were to hover over any of the small images to the right side of the product you will see that it says the Color name.
I've managed to navigate to there but I can't for the life of me figure out how to get the title attribute. Currently it's printing out nodes but I want to access the title attribute.
How do i properly access the title attribute and print out the corresponding color to the pictures?
This is the test link I am using (AliExpress)
Console.WriteLine("Product URL: ");
//Declare the URL
string url = Console.ReadLine();
// HtmlWeb - A Utility class to get HTML document from http
var web = new HtmlWeb();
//Load() Method download the specified HTML document from an Internet resource.
var doc = web.Load(url);
var nodes = doc.DocumentNode.SelectNodes("//li[#class = 'item-sku-image']");
foreach (var node in nodes)
{
//var colors = node.Attributes["/a[title]"].Value;
Console.WriteLine(node);
}
Console.ReadLine();
You can try and iterate over the following X-Path: //li[#class = 'item-sku-image']/a/img/#title, or else, replace this: //li[#class = 'item-sku-image'], with this: //li[#class = 'item-sku-image']/a/img, and then check the attributes of the nodes.
It should yield a series of strings which contain the title you are after.

get value from web page using Html Agility Pack

I am trying to get the value of the "Pool Hashrate" using the HTML Agility Pack. Right when I hit my string hash, I get Object reference not set to an instance of an object. Can somebody tell me what I am doing wrong?
string url = http://p2pool.org/ltcstats.php?address
protected void Page_Load(string address)
{
string url = address;
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(Url);
string hash = doc.DocumentNode.SelectNodes("/html/body/div/center/div/table/tbody/tr[1]")[0].InnerText;
}
Assuming you're trying to access that url, of course it should fail. That url doesn't return a full document, but just a fragment of html. There is no html tag, there is no body tag, just the div. Your xpath query returns nothing and thus the null reference exception. You need to query the right thing.
When I access that url, it returns this:
<div>
<center>
<div style="margin-right: 20px;">
<h3>Personal LTC Stats</h3>
<table class='zebra-striped'>
<tr><td>Pool Hashrate: </td><td>66.896 Mh/s</td></tr>
<tr><td>Your Hashrate: </td><td>0 Mh/s</td></tr>
<tr><td>Estimated Payout: </td><td> LTC</td></tr>
</table>
</div>
</center>
</div>
Given this, if you wanted to get the Pool Hashrate, you'd use a query more like this:
/div/center/div/table/tr[1]/td[2]
In the end you need to do this:
var url = "http://p2pool.org/ltcstats.php?address";
var web = new HtmlWeb();
var doc = web.Load(url);
var xpath = "/div/center/div/table/tr[1]/td[2]";
var poolHashrate = doc.DocumentNode.SelectSingleNode(xpath);
if (poolHashrate != null)
{
var hash = poolHashrate.InnerText;
// do stuff with hash
}
The problem is that xpath is not finding the specified node. You can specify an id to the table or the tr in order to have a smaller xpath
Also, based on your code I assume that you're looking for a single node only, so you may want to use something like this
doc.DocumentNode.SelectSingleNode("xpath");
Another good option is using Fizzler

Trying to scrape a webpage with Agility on C#

i m using C# and Agility Pack to scrape a website but the result i m getting differs from what i m seeing in firebug. I suppose this is cause the site is using some Ajax.
// The HtmlWeb class is a utility class to get the HTML over HTTP
HtmlWeb htmlWeb = new HtmlWeb();
// Creates an HtmlDocument object from an URL
HtmlAgilityPack.HtmlDocument document = htmlWeb.Load("http://www.saxobank.com/market-insight/saxotools/forex-open-positions");
// Targets a specific node
HtmlNode someNode = document.GetElementbyId("href");
// If there is no node with that Id, someNode will be null
richTextBox1.Text = document.DocumentNode.OuterHtml;
Could someone advise on the code on how to do this correctly cause what i m getting back is just plain html. What i m looking for is for a
div id= "ctl00_MainContent_PositionRatios1_canvasContainer"
Any ideas?
If you have the ID, then use it - href is the name of an attribute, not the id.
HtmlNode someNode =
document.GetElementbyId("ctl00_MainContent_PositionRatios1_canvasContainer");
This will be the div with that id.
You can then select any a child node and its href attribute of that node using:
var href = someNode.SelectNodes("//a")[0].Attributes["href"].Value;

Im trying to get all the links from a website and put them in a List but sometimes im getting strange links why?

This is the code to get the links:
private List<string> getLinks(HtmlAgilityPack.HtmlDocument document)
{
List<string> mainLinks = new List<string>();
var linkNodes = document.DocumentNode.SelectNodes("//a[#href]");
if (linkNodes != null)
{
foreach (HtmlNode link in linkNodes)
{
var href = link.Attributes["href"].Value;
mainLinks.Add(href);
}
}
return mainLinks;
}
Sometimes the links im getting are starting like "/" or:
"/videos?feature=mh"
Or
"//www.youtube.com/my_videos_upload"
Im not sure if just "/" meaning a proper site or a site that start with "/videoes?...
Or "//www.youtube...
I need to get each time the links from a website that start with http or https maybe just www also count as a proper site. The question is what i define as a proper site address and a link and whats not ?
Im sure my getLinks function is not good the code is not the proper way it should be.
This is how im adding the links to the List:
private List<string> test(string url, int levels , DoWorkEventArgs eve)
{
HtmlAgilityPack.HtmlDocument doc;
HtmlWeb hw = new HtmlWeb();
List<string> webSites;// = new List<string>();
List<string> csFiles = new List<string>();
try
{
doc = hw.Load(url);
webSites = getLinks(doc);
webSites is a List
After few times i see in the List sites like "/" or as above "//videoes... or "//www....
not sure if understood your question but
/Videos means it is accessing Videos folder from the root of the host you are accessing
ex:
www.somesite.com/Videos
There are absolute and relative Urls - so you are getting different flavors from different links, you need to make them absolute url appropriately (Uri class mostly will handle it for you).
foo/bar.txt - relative url from the same path as current page
../foo/bar.txt - relative path from one folder above current
/foo/bar.txt - server-relative pat from root - same server, path starting from root
//www.sample.com/foo/bar.txt - absolute url with the same scheme (http/https) as current page
http://www.sample.com/foo/bar.txt - complete absolute url
It looks like you are using a library that is able to parse/read html tags.
For my understanding
var href = link.Attributes["href"].Value;
is doing nothing but reading the value of the "href" attribute.
So assuming the website's source code is using links like href="/news"
it will grab and save even the relative links to your list.
Just view the target website's sourcecode and check it against your results.

C# Html Agility Pack ( SelectSingleNode )

I'm trying to parse this field, but can't get it to work. Current attempt:
var name = doc.DocumentNode.SelectSingleNode("//*[#id='my_name']").InnerHtml;
<h1 class="bla" id="my_name">namehere</h1>
Error: Object reference not set to an instance of an object.
Appreciate any help.
#John - I can assure that the HTML is correctly loaded. I am trying to read my facebook name for learning purposes. Here is a screenshot from the Firebug plugin. The version i am using is 1.4.0.
http://i54.tinypic.com/kn3wo.jpg
I guess the problem is that profile_name is a child node or something, that's why I'm not able to read it?
The reason your code doesn't work is because there is JavaScript on the page that is actually writing out the <h1 id='profile_name'> tag, so if you're requesting the page from a User Agent (or via AJAX) that doesn't execute JavaScript then you won't find the element.
I was able to get my own name using the following selector:
string name =
doc.DocumentNode.SelectSingleNode("//a[#id='navAccountName']").InnerText;
Try this:
var name = doc.DocumentNode.SelectSingleNode("//#id='my_name'").InnerHtml;
HtmlAgilityPack.HtmlNode name = doc.DocumentNode.SelectSingleNode("//h1[#id='my_name']").InnerText;
public async Task<List<string>> GetAllTagLinkContent(string content)
{
string html = string.Format("<html><head></head><body>{0}</body></html>", content);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
var nodes = doc.DocumentNode.SelectNodes("//[#id='my_name']");
return nodes.ToList().ConvertAll(r => r.InnerText).Select(j => j).ToList();
}
It's ok with ("//a[#href]"); You can try it as above.Hope helpful

Categories