C# Html parsing - c#

I'm trying to parse HTML in my C# project without success, I am using a HtmlAgilityPack lib to do so, I can get some of the HTML body text but not all of it for some reason.
I need to grab the div with ID of latestPriceSection, and filter to the USD value from https://www.monero.how/widget
My function (doesn't work)
public void getXMRRate()
{
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument document = web.Load("https://www.monero.how/widget");
HtmlNode[] nodes = document.DocumentNode.SelectNodes("//a").Where(x => x.InnerHtml.Contains("latestPriceSection")).ToArray();
foreach (HtmlNode item in nodes)
{
Console.WriteLine(item.InnerHtml);
}
}

Your function doesn't work because the widget is updated via script. The div contains nothing when you load the page. You can't use HAP to scrape the information of this. Find a web service that can give you the information you need.
Alternatively you can use Selenium to get the HTML after the page has loaded the script. Or you the WebBrowser class, but that requires you to have a form application where the form contains the WebBrowser.

You need to retrieve JSON-data from https://www.monero.how/widgetLive.json, because widget use this resource in Ajax request.

Related

How to extract JSON embedded on a HTML page using C#

The JSON I wish to use is embedded on a HTML page. Within a tag on the page there is a statement:
<script>
jsonRAW = {... heaps of JSON... }
Is there a parser to extract this from HTML? I have looked at json.NET but it requires its JSON reasonably formatted.
You can try to use HTML Agility pack. This can be downloaded as a Nuget Package.
After installing, this is a tutorial on how to use HTML Agility pack.
The link has more info but it works like this in code:
var urlLink = "http://www.google.com/jsonPage"; // 1. Specify url where the json is to read.
var web = new HtmlWeb(); // Init the HTMl Web
var doc = web.Load (urlLink); // Load our url
if (doc.ParseErrors != null) { // Check for any errors and deal with it.
}
doc.DocumentNode.SelectSingleNode(""); // Access the dom.
There are other things in between but this should get you started.

I can't get the content of a web page without html codes in C#

I want to get the text of a web page in windows form application. I am using:
WebClient client = new WebClient();
string downloadString = client.DownloadString(link);
However, it gave me html codes of the web page.
Here is the question:
Can I get the specific part of a website? For example a part that has a class name "ask-page new-topbar". I want to get every part that has class name "ask-page new-topbar".
No, you can't get only parts of a website, when you send a request to a url.
What you can do is use the Html Agility Pack and let it dig through the Html code to give you the contents of the requested node.

HTMLAgilitypack read html page info with ajax calls

I am using HtmlAgilitypack for reading specific html elements of a specific url.
The problem I am facing is one of the html tag contents are filled by AJAX requests. So how can I read this ?
<div id="priceInfo"></div>
Code I used to read the url is
HtmlWeb _htmlWeb = new HtmlWeb();
HtmlAgilityPack.HtmlDocument _webDoc = _htmlWeb.Load(webUrl);
// HtmlNodeCollection _priceNode = Gets the node with id priceInfo
The contents of this div is filled by a ajax request and i want to read the contents of this DIv after its getting filled. How can i do that
HtmlAgilityPack is to be used at server side. from what you stating, you are trying to assert a value at client side, not at the server side.
you should look into using jquery/javascript once the ajax call is done.
ajax ({ ....
.done(...) {
// handling the return result...
alert($("#yourHtmlId").val()); // show one of your html tag value attribute.
}
})
http://api.jquery.com/jQuery.ajax/

Selenium C# Dynamic Meta Tags

Im using Selenium for C# in order to serve fully rendered javascript applications to google spiders and users with javascript disabled. I am using ASP.NET MVC to serve the pages from my controller. I need to be able to generate dynamic meta tags before the content is served to the caller. For example, the following pseudo code:
var pageSource = driver.PageSource; // This is where i get my page content
var meta = driver.findElement(By.tagname("meta.description")).getAttribute("content");
meta.content = "My New Meta Tag Value Here";
return driver.PageSource; // return the page source with edited meta tags to the client
I know how to get the page source to the caller, i am already doing this, but i cant seem to find the right selector to edit the meta tags before i push the content back to the requester. How would I accomplish this?
Selenium doesn't have a feature specifically for this. But technically, you can change meta tags with JavaScript, so you can use Selenium's IJavaScriptExecutor in C#.
If the page is using jQuery, here's one way to do it:
// new content to swap in
String newContent = "My New Meta Tag Value Here";
// jQuery function to do the swapping
String changeMetasScript = "$('meta[name=author]').attr('content', arguments[0]);"
// execute with JavaScript Executer
IJavaScriptExecutor js = driver as IJavaScriptExecutor;
js.ExecuteScript(changeMetasScript, newContent);

Add querystring to all anchor links in HTML body

In C# given a string which contains HTML what is the best way to automatically add the query string data test=1 to the end of every hyperlink? It should only modify the url inside the href attribute for anchor links (eg not do it for image urls etc).
An example would be:
Input
Visit http://www.test.com today
and see what deals we have.
Output
Visit http://www.test.com today
and see what deals we have.
This seems to be a bit tricky and am not sure where the best place to start on this would be. Any help appreciated!
HTML Agility Pack is a very fine library for parsing HTML.
Sample for get all text in html:
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("Yor Path(local,web)");
var result=doc.DocumentNode.SelectNodes("//body//text()");//return HtmlCollectionNode
foreach(var node in result)
{
string AchivedText=node.InnerText;//Your desire text
}

Categories