HTMLAgilitypack read html page info with ajax calls - c#

I am using HtmlAgilitypack for reading specific html elements of a specific url.
The problem I am facing is one of the html tag contents are filled by AJAX requests. So how can I read this ?
<div id="priceInfo"></div>
Code I used to read the url is
HtmlWeb _htmlWeb = new HtmlWeb();
HtmlAgilityPack.HtmlDocument _webDoc = _htmlWeb.Load(webUrl);
// HtmlNodeCollection _priceNode = Gets the node with id priceInfo
The contents of this div is filled by a ajax request and i want to read the contents of this DIv after its getting filled. How can i do that

HtmlAgilityPack is to be used at server side. from what you stating, you are trying to assert a value at client side, not at the server side.
you should look into using jquery/javascript once the ajax call is done.
ajax ({ ....
.done(...) {
// handling the return result...
alert($("#yourHtmlId").val()); // show one of your html tag value attribute.
}
})
http://api.jquery.com/jQuery.ajax/

Related

Get complete Html Body on post in asp.net core

I am trying to get the html body of the view on post but my body is always empty.
below is my code.
[HttpPost]
public async Task<IActionResult> GetPdf()
{
var request = HttpContext.Request.Body;
using (var bodyReader = new StreamReader(request))
{
string body = await bodyReader.ReadToEndAsync();
//request.Body = new MemoryStream(Encoding.UTF8.GetBytes(body));
}
}
Post request contains only form fields (if it is HTML form post) or values explicitly added to request (if it is AJAX / other manual post).
If you really need to get Html of the page that way - grab whole HTML (i.e. with jQuery's $('body').html()) and post it back to server (i.e. How to pass parameters in $ajax POST?).
$.post('myServerUrl', { html: $('body').html()}, ...}
Note that if you trying to do something like rendering of HTML to PDF you'd also need CSS and possibly JS files. Using more specialized tools like HTML to PDF converters may be more appropriate.

C# Html parsing

I'm trying to parse HTML in my C# project without success, I am using a HtmlAgilityPack lib to do so, I can get some of the HTML body text but not all of it for some reason.
I need to grab the div with ID of latestPriceSection, and filter to the USD value from https://www.monero.how/widget
My function (doesn't work)
public void getXMRRate()
{
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument document = web.Load("https://www.monero.how/widget");
HtmlNode[] nodes = document.DocumentNode.SelectNodes("//a").Where(x => x.InnerHtml.Contains("latestPriceSection")).ToArray();
foreach (HtmlNode item in nodes)
{
Console.WriteLine(item.InnerHtml);
}
}
Your function doesn't work because the widget is updated via script. The div contains nothing when you load the page. You can't use HAP to scrape the information of this. Find a web service that can give you the information you need.
Alternatively you can use Selenium to get the HTML after the page has loaded the script. Or you the WebBrowser class, but that requires you to have a form application where the form contains the WebBrowser.
You need to retrieve JSON-data from https://www.monero.how/widgetLive.json, because widget use this resource in Ajax request.

Download html doesnt retrieve dynamically generated elements

For documentation purposes I need the html of each "...Controller" element on the following website :
https://apipivotdev.azurewebsites.net/swagger/ui/index.
This doesn't work :
string html;
using (var wc = new WebClient())
html= wc.DownloadString(url);
because that will only render a few divs at the top of the document.
Parsing the underlying json and merging it with html would be too timeconsuming and again, I just need the html.
Question: How would I get the complete webpage in C# ?

Parse webpage with Fragment identifier in URL, using HTML Agility Pack

I want to parse webpage with Fragment identifier(#), f.e. http://steamcommunity.com/market/search?q=appid%3A570+uncommon#p4
When i use my browser(Google Chrome), i have different result, for different identifier(#p1,#p2,#p3), but when i use HTML Agility Pack, i always get first page, despite of page identifier.
string sURL = "http://steamcommunity.com/market/search?q=appid%3A570+uncommon#p"
wClient = new WebClient();
html = new HtmlAgilityPack.HtmlDocument();
html.LoadHtml(wClient.DownloadString(sURL+i));
I understand, that something like Ajax used here and in fact exist only one page. How can i fix my problem, and get results from other pages using C#?
Like David said,
use URL : http://steamcommunity.com/market/search/render/?query=appid%3A570%20uncommon&search_descriptions=0&start=30&count=10
where start is the start number and count is the number of items you want.
the result is a json result, so for stating the obvious you only want to use results_html
side note: in your chrome browser (when pressed F12) click on network tab and you will see the request and result being made

Selenium C# Dynamic Meta Tags

Im using Selenium for C# in order to serve fully rendered javascript applications to google spiders and users with javascript disabled. I am using ASP.NET MVC to serve the pages from my controller. I need to be able to generate dynamic meta tags before the content is served to the caller. For example, the following pseudo code:
var pageSource = driver.PageSource; // This is where i get my page content
var meta = driver.findElement(By.tagname("meta.description")).getAttribute("content");
meta.content = "My New Meta Tag Value Here";
return driver.PageSource; // return the page source with edited meta tags to the client
I know how to get the page source to the caller, i am already doing this, but i cant seem to find the right selector to edit the meta tags before i push the content back to the requester. How would I accomplish this?
Selenium doesn't have a feature specifically for this. But technically, you can change meta tags with JavaScript, so you can use Selenium's IJavaScriptExecutor in C#.
If the page is using jQuery, here's one way to do it:
// new content to swap in
String newContent = "My New Meta Tag Value Here";
// jQuery function to do the swapping
String changeMetasScript = "$('meta[name=author]').attr('content', arguments[0]);"
// execute with JavaScript Executer
IJavaScriptExecutor js = driver as IJavaScriptExecutor;
js.ExecuteScript(changeMetasScript, newContent);

Categories