Download html doesnt retrieve dynamically generated elements

Download html doesnt retrieve dynamically generated elements - c#

For documentation purposes I need the html of each "...Controller" element on the following website :
https://apipivotdev.azurewebsites.net/swagger/ui/index.
This doesn't work :
string html;
using (var wc = new WebClient())
html= wc.DownloadString(url);
because that will only render a few divs at the top of the document.
Parsing the underlying json and merging it with html would be too timeconsuming and again, I just need the html.
Question: How would I get the complete webpage in C# ?

Related

How to extract JSON embedded on a HTML page using C#

The JSON I wish to use is embedded on a HTML page. Within a tag on the page there is a statement:
<script>
jsonRAW = {... heaps of JSON... }
Is there a parser to extract this from HTML? I have looked at json.NET but it requires its JSON reasonably formatted.

You can try to use HTML Agility pack. This can be downloaded as a Nuget Package.
After installing, this is a tutorial on how to use HTML Agility pack.
The link has more info but it works like this in code:
var urlLink = "http://www.google.com/jsonPage"; // 1. Specify url where the json is to read.
var web = new HtmlWeb(); // Init the HTMl Web
var doc = web.Load (urlLink); // Load our url
if (doc.ParseErrors != null) { // Check for any errors and deal with it.
}
doc.DocumentNode.SelectSingleNode(""); // Access the dom.
There are other things in between but this should get you started.

C# Html parsing

I'm trying to parse HTML in my C# project without success, I am using a HtmlAgilityPack lib to do so, I can get some of the HTML body text but not all of it for some reason.
I need to grab the div with ID of latestPriceSection, and filter to the USD value from https://www.monero.how/widget
My function (doesn't work)
public void getXMRRate()
{
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument document = web.Load("https://www.monero.how/widget");
HtmlNode[] nodes = document.DocumentNode.SelectNodes("//a").Where(x => x.InnerHtml.Contains("latestPriceSection")).ToArray();
foreach (HtmlNode item in nodes)
{
Console.WriteLine(item.InnerHtml);
}
}

Your function doesn't work because the widget is updated via script. The div contains nothing when you load the page. You can't use HAP to scrape the information of this. Find a web service that can give you the information you need.
Alternatively you can use Selenium to get the HTML after the page has loaded the script. Or you the WebBrowser class, but that requires you to have a form application where the form contains the WebBrowser.

You need to retrieve JSON-data from https://www.monero.how/widgetLive.json, because widget use this resource in Ajax request.

Parse webpage with Fragment identifier in URL, using HTML Agility Pack

I want to parse webpage with Fragment identifier(#), f.e. http://steamcommunity.com/market/search?q=appid%3A570+uncommon#p4
When i use my browser(Google Chrome), i have different result, for different identifier(#p1,#p2,#p3), but when i use HTML Agility Pack, i always get first page, despite of page identifier.
string sURL = "http://steamcommunity.com/market/search?q=appid%3A570+uncommon#p"
wClient = new WebClient();
html = new HtmlAgilityPack.HtmlDocument();
html.LoadHtml(wClient.DownloadString(sURL+i));
I understand, that something like Ajax used here and in fact exist only one page. How can i fix my problem, and get results from other pages using C#?

Like David said,
use URL : http://steamcommunity.com/market/search/render/?query=appid%3A570%20uncommon&search_descriptions=0&start=30&count=10
where start is the start number and count is the number of items you want.
the result is a json result, so for stating the obvious you only want to use results_html
side note: in your chrome browser (when pressed F12) click on network tab and you will see the request and result being made

HTMLAgilitypack read html page info with ajax calls

I am using HtmlAgilitypack for reading specific html elements of a specific url.
The problem I am facing is one of the html tag contents are filled by AJAX requests. So how can I read this ?
<div id="priceInfo"></div>
Code I used to read the url is
HtmlWeb _htmlWeb = new HtmlWeb();
HtmlAgilityPack.HtmlDocument _webDoc = _htmlWeb.Load(webUrl);
// HtmlNodeCollection _priceNode = Gets the node with id priceInfo
The contents of this div is filled by a ajax request and i want to read the contents of this DIv after its getting filled. How can i do that

HtmlAgilityPack is to be used at server side. from what you stating, you are trying to assert a value at client side, not at the server side.
you should look into using jquery/javascript once the ajax call is done.
ajax ({ ....
.done(...) {
// handling the return result...
alert($("#yourHtmlId").val()); // show one of your html tag value attribute.
}
})
http://api.jquery.com/jQuery.ajax/

Selenium C# Dynamic Meta Tags

Im using Selenium for C# in order to serve fully rendered javascript applications to google spiders and users with javascript disabled. I am using ASP.NET MVC to serve the pages from my controller. I need to be able to generate dynamic meta tags before the content is served to the caller. For example, the following pseudo code:
var pageSource = driver.PageSource; // This is where i get my page content
var meta = driver.findElement(By.tagname("meta.description")).getAttribute("content");
meta.content = "My New Meta Tag Value Here";
return driver.PageSource; // return the page source with edited meta tags to the client
I know how to get the page source to the caller, i am already doing this, but i cant seem to find the right selector to edit the meta tags before i push the content back to the requester. How would I accomplish this?

Selenium doesn't have a feature specifically for this. But technically, you can change meta tags with JavaScript, so you can use Selenium's IJavaScriptExecutor in C#.
If the page is using jQuery, here's one way to do it:
// new content to swap in
String newContent = "My New Meta Tag Value Here";
// jQuery function to do the swapping
String changeMetasScript = "$('meta[name=author]').attr('content', arguments[0]);"
// execute with JavaScript Executer
IJavaScriptExecutor js = driver as IJavaScriptExecutor;
js.ExecuteScript(changeMetasScript, newContent);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Download html doesnt retrieve dynamically generated elements - c#

Related

How to extract JSON embedded on a HTML page using C#

C# Html parsing

Parse webpage with Fragment identifier in URL, using HTML Agility Pack

HTMLAgilitypack read html page info with ajax calls

Selenium C# Dynamic Meta Tags

Categories

Resources