How to get div without class with Html Agility Pack - c#

I try to get 'Galatasaray.' from following HTML. I don't know how can I specified that
<div class="dRib1">
<h2>Bilgi</h2>
</div>
<div>Galatasaray.</div>

HtmlNodeCollection aciklamaCek = dokuman.DocumentNode.SelectNodes("//div[#class='dRib1']");
textBox8.Text = aciklamaCek[0].NextSibling.NextSibling.InnerText.Trim();
I solved my problem with this code.

Related

Problems with using .InnerHtml in ASP.NET c #

I have a div in the aspx page which i gave the attribute runat = "server" and id = "content".
When starting the page I need to create some cards with data taken from a database. But it gives me error when I try from code behind in c # using content.InnerHtml to add html code.
aspx page
<div class="page-wrapper">
<div class="page-content" runat="server" id="contenuto">
</div>
</div>
code behind
contenuto.InnerHtml += "<div class='row'>";
it gives me this error: "Could not get the internal content of content because it is not in literal format."
image of the Error
"contenuto" not containing any server control inside it, you required to change your code like below -
var sb = new StringBuilder();
String innertext =sb.Append("Text what you required or tag");
contenuto.InnerHtml = innertext ;

How to scrape a variable data from a source code?

I'm trying to scrape a link from the source code of a website that varies with every source code.
Form example:
<div align="center">
<a href="http://www10.site.com/d/the rest of the link">
<span class="button_upload green">
The next time I get the source code the http://www10 changes to any http://www + number like http://www65.
How can I scrape the exact link with the new changed number?
Edit :
Here's how i use RE MatchCollection m1 = Regex.Matches(textBox6.Text, "(href=\"http://www10)(?<td_inner>.*?)(\">)", RegexOptions.Singleline);
You mentioned in the comments that you use Regulars expressions for parsing the HTML Document. That is a the hardest way you can do this (also, generally not recommended!). Try using a HTML Parser like http://html-agility-pack.net
For HTML Agility Pack: You install it via NuGet Packeges and here is an example (posted on their website):
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[#href]")
{
HtmlAttribute att = link["href"];
att.Value = FixLink(att);
}
doc.Save("file.htm");
It can also load string contents, not just files. You use xPath or CSS Selectors to navigate inside the document and select what you want.
How about a JS function like this, run when the page loads:
// jQuery is required!
var updateLinkUrl = function (num) {
$.each($('.button_upload.green'), function (pos, el) {
var orig = $(el).parent().prop("href");
var newurl = orig.replace("www10", "www" + num);
$(el).parent().prop("href", newurl);
});
};
$(document).ready(function () { updateLinkUrl(65); });

Reading a textstring from a webpage

Currently I'm trying to read out a text from a Website via a c# program.
To be exact the Track and the Dj from www.hardbase.fm.
This is what the page source looks like:
<div id="Moderator">
<div id="Moderator_special">
<div style="width:158px; float:left; margin:8px"></div>
<div id="onAir" style="width:420px;overflow:hidden;">
<strong>
<a href="/member/46069" target="_top">
<span style="color:#4AA6E5">BIOCORE</span>
</a>
<span style="color:#26628B"> mit "This Is BIOCORE" (Hardstyle)</span>
</strong>
</div>
</div>
</div>
The text I want to read out is "BIOCORE" and "mit "This Is BIOCORE" (Hardstyle)"
(the text seen when running the snippet).
If have tried the following:
System.Net.WebClient wc = new System.Net.WebClient();
byte[] raw = wc.DownloadData("http://www.hardbase.fm/");
first = webData.IndexOf("#4AA6E5\">") + "#4AA6E5\">".Length;
last = webData.LastIndexOf("</span></a><span style=\"color:#26628B\">");
hb_dj = webData.Substring(first, last - first);
But this doesn't always works because sometimes the source code of the page changes a bit. Like the color or so. And then the search wont work.
So the question is: Is there a better method to do this?
You should try the HTML Agility Pack
HtmlWeb page = new HtmlWeb();
HtmlDocument document = page.Load("http://www.hardbase.fm/");
var nodes = document.DocumentNode.SelectNodes("//[#id='onAir']");
var nodes2 = nodes.Select(c1 => c1.SelectNodes("span")).ToList();
var span1=nodes2[0];
var span2 nodes2[1]

get value from web page using Html Agility Pack

I am trying to get the value of the "Pool Hashrate" using the HTML Agility Pack. Right when I hit my string hash, I get Object reference not set to an instance of an object. Can somebody tell me what I am doing wrong?
string url = http://p2pool.org/ltcstats.php?address
protected void Page_Load(string address)
{
string url = address;
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(Url);
string hash = doc.DocumentNode.SelectNodes("/html/body/div/center/div/table/tbody/tr[1]")[0].InnerText;
}
Assuming you're trying to access that url, of course it should fail. That url doesn't return a full document, but just a fragment of html. There is no html tag, there is no body tag, just the div. Your xpath query returns nothing and thus the null reference exception. You need to query the right thing.
When I access that url, it returns this:
<div>
<center>
<div style="margin-right: 20px;">
<h3>Personal LTC Stats</h3>
<table class='zebra-striped'>
<tr><td>Pool Hashrate: </td><td>66.896 Mh/s</td></tr>
<tr><td>Your Hashrate: </td><td>0 Mh/s</td></tr>
<tr><td>Estimated Payout: </td><td> LTC</td></tr>
</table>
</div>
</center>
</div>
Given this, if you wanted to get the Pool Hashrate, you'd use a query more like this:
/div/center/div/table/tr[1]/td[2]
In the end you need to do this:
var url = "http://p2pool.org/ltcstats.php?address";
var web = new HtmlWeb();
var doc = web.Load(url);
var xpath = "/div/center/div/table/tr[1]/td[2]";
var poolHashrate = doc.DocumentNode.SelectSingleNode(xpath);
if (poolHashrate != null)
{
var hash = poolHashrate.InnerText;
// do stuff with hash
}
The problem is that xpath is not finding the specified node. You can specify an id to the table or the tr in order to have a smaller xpath
Also, based on your code I assume that you're looking for a single node only, so you may want to use something like this
doc.DocumentNode.SelectSingleNode("xpath");
Another good option is using Fizzler

How can I add "class" attributes to HTML elements?

I have the following HTML. I want to add class="last" attributes to the final li elements in each list. How can I do this?
<div class="gpbscol">
<ul class="listl">
<li>ACCESSORIES</li>
<li>AMPLIFIERS</li>
<li>ANALOG AUDIO PROCESSING</li>
<li>MICROPHONE PREAMPLIFIERS</li>
<li>MICROPHONES</li>
<li>SPEAKERS/MONITORS</li>
<li>STUDIO</li>
<li>DIGITAL AUDIO PROCESSING</li>
<li>CONSOLES, MIXERS</li>
<li>DAWS/PERIPHERALS</li>
</ul>
</div>
<div class="audio">
<ul class="listl">
<li>DAWS/PERIPHERALS</li>
<li>LOUDSPEAKERS — FOH</li>
<li>RECORDERS/PLAYERS</li>
<li>HEADPHONES</li>
<li>MICROPHONES - WIRELESS CONVERTERS</li>
<li>NETWORK AUDIO / CONTROL / SNAKES</li>
<li>COMPUTER AUDIO INTERFACES</li>
<li>INTERCONNECTS</li>
<li>LOUDSPEAKERS — STAGE MONITORS</li>
<li>ACOUSTIC TREATMENT</li>
<li>MI PRODUCTS</li>
</ul>
</div>
So the final element might be
<li class="last">MI PRODUCTS</li>
I would be a styling issue. If I had no option for client side codes, I would go for CSS styling. You may consider this:
ul.listl li:last-child { }
This would be easier to do with jQuery:
$(function(){
$("ul.listl li:last").addClass("last");
});
And that's all, folks :)
I agree with #JohnHartsock but if you want to do it your way you can use a variety of libraries which help you in querying html elements (DOM).
Fizzler
Sharp-Query
HTML Agility Pack
string[] g=#"<div class=""gpbscol"">
<ul class=""listl"">
<li>ACCESSORIES</li>
<li>AMPLIFIERS</li>
<li>ANALOG AUDIO PROCESSING</li>
<li>MICROPHONE PREAMPLIFIERS</li>
<li>MICROPHONES</li>
<li>SPEAKERS/MONITORS</li>
<li>STUDIO</li>
<li>DIGITAL AUDIO PROCESSING</li>
<li>CONSOLES, MIXERS</li>
<li>DAWS/PERIPHERALS</li>
</ul>
</div>
<div class=""audio"">
<ul class=""listl"">
<li>DAWS/PERIPHERALS</li>
<li>LOUDSPEAKERS — FOH</li>
<li>RECORDERS/PLAYERS</li>
<li>HEADPHONES</li>
<li>MICROPHONES - WIRELESS CONVERTERS</li>
<li>NETWORK AUDIO / CONTROL / SNAKES</li>
<li>COMPUTER AUDIO INTERFACES</li>
<li>INTERCONNECTS</li>
<li>LOUDSPEAKERS — STAGE MONITORS</li>
<li>ACOUSTIC TREATMENT</li>
<li>MI PRODUCTS</li>
</ul>
</div>".Split(new string[]{"<li>"});
g[g.length-1]=g[g.length-1].replace("<li>","<li class='last' >");
string newString=String.Join("", g);
The easiest of all, this is using HtmlAgilityPack;
TextWriter text = new StringWriter();
string set = [html here];
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(set);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//li[last()]");
HtmlAttribute attr;
attr = node.SetAttributeValue("class", "last");
doc.Save(text);
return text;
anyways thanks all for helping me.

Categories