reading div after javascript load [closed] - c#

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I'm having problems parsing information from a forum.
Heres some examples:
Easy
Hard
It would be really easy to get the information as they are displayed in the div where id = "poe-popup-container".
The problem is that that div is only populated when the browser allows you to see the information. That can be easily reproduced by making your browser height really small and looking in the HTML code for the . However, the div will be empty, but as soon you scroll down to see the item it will change.
I'm trying to read the nodes inside the with htmlagillitypack. The problem is that, as i explained, it only has information when the browser says that you need that information.
So, when you try to download the html, the div is empty.
I've tried to download the page with the web browser too, but the same thing happens.
I'm trying to use the following code:
string page = System.Text.Encoding.UTF8.GetString(Webclient.DownloadData("http://www.pathofexile.com/forum/view-thread/966384"));
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(page);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[#id='poe-popup-container']");
MessageBox.Show(node.InnerHtml);

You're trying to do impossible. Javascript is executed in browser. HtmlAgilityPack is library just for parsing static html - it can't execute javascript.
So why don't you look into browser automation instead ? Try for example http://watin.org/

Related

Getting data of HTML DIV tag without using Regular Expressions [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Hello to all respected Experts,
I've one question regarding to C#.net. What i wanna do is that basically i have one HTML page
and i wanna extract data from it's DIV tag this is the sample of HTML :
<div class="clr fleft">
<strong class="xx-large">033 111 22222</strong>
</div>
Now I wanna Get those numbers which are inside of "xx-large" Tag.
I want some help in doing it.
You can use HtmlAgilityPack
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlstring);
using XPATH,
var data = doc.DocumentNode.SelectSingleNode("//*[#class='xx-large']").InnerText;
using Linq,
var data = doc.DocumentNode.Descendants()
.Where(x => x.Attributes["class"] != null && x.Attributes["class"].Value == "xx-large")
.First()
.InnerText;
As i know, you cant access them just by c# (your server-side codes). You must write some javascript codes to do this. (your javascript code can have no regex)
All you need is a library with predefined parsers. You can use Beautiful Soup parser (originally written in python, can be interfaced with C#) see how it's done http://ashomtwit.espace-technologies.com/4499480-BeautifulSoup_and_ASP_NET_C_.html or you can choose an alternative package. These library has the predefined regular expression and has methods to open web pages to collect the information. It is so simple to use this.

Load a second ASP.NET page into codebehind [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have something I'd like to attempt, but am unsure of the best path to accomplish it. I have a page, say default.aspx, that creates some content. I also have a second page, say input.aspx, that creates a small select box. This box is to be loaded via ajax on a chang emade on default.aspx. However, I also need this box to initial load in the codebehind of default.aspx.
Example:
1. default.aspx codebehind creates content and loads input.aspx in the codebehind
2. field changes on default.aspx and input.aspx is changed via ajax using Jquery
However, I cannot seem to find the best possible way to load a second ASP.NET page into the codebehind of the initial page. I was considering using an HttpWebRequest object, but am not sure the syntax. Any help would be appreciated.
Thanks!
At Request of #Mbeckish
What I need to happen is outlined below step-by-step
default.aspx loads with content generated from codebehind
Also in codebehind, a select box is loaded (I have this in a separate input.aspx file now. This is the step I need help with.)
default.aspx response is returned and displayed on client
user changes a form value on default.aspx
the select provided by input.aspx is reloaded from server (I currently use a JQuery ajax request to allow this)
Sounds like your second page 'input.aspx' should really be a UserControl (.ascx file), which you dynamically load (using Page.LoadControl(...)) in your default.aspx page.
What you need is to add an iframe to default1.aspx and set it's src property to be the other page
<iframe src="otherPage.aspx"></iframe>
http://www.w3schools.com/tags/tag_iframe.asp

Parse HTML With C# [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'd like to parse html page using C#. There are html pages which contain a lot of html tags, here's a sample of one of them :
<span class=text14 id="article_content"><!-- RELEVANTI_ARTICLE_START --><span ></b>The
most important component for <a
class=bluelink href="http://www.ynetnews.com/articles/0,7340,L-
3284752,00.html%20"' onmouseover='this.href=unescape(this.href)'
target=_blank>Israel</a>'s
security is its special relations with the American administration, and especially with its generous purse. When the Netanyahu government launches a great outcry against the <a ...
but I'd only like to get the content wrapped by the <span class=text14 id="article_content"> tag.
At first I've thought about using preg match, but then realized it's not efficient at all.
I've later read about Html Agility Pack and FizzlerEx -
i'd like to know whether it's possible to get the text wrapped by the specific tag i've mentioned using these tools, and i'd be grateful if someone could tell me how fast this task could be performed.
It's pretty straight forward using Html Agility Pack:
var markup = #"<span class=text14 id=""article_content""><!-- RELEVANTI_ARTICLE_START --><span ></b>The most important component for <a class=bluelink href=""http://www.ynetnews.com/articles/0,7340,L-3284752,00.html%20""' onmouseover='this.href=unescape(this.href)' target=_blank>Israel</a>'s security is its special relations with the American administration, and especially with its generous purse. When the Netanyahu government launches a great outcry against the</span>";
var doc = new HtmlDocument();
doc.LoadHtml(markup);
var content = doc.GetElementbyId("article_content").InnerText;
Console.WriteLine(content);

First page with QueryString [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
How to load index page with QueryString in asp.net? I know that we can redirect to a particular page with QueryString, but what I want is to load first page with some querystring.
If you are setting start action in Property pages of your application then you can follow following steps
1) right click on your project in solution explores
2) Go to Property pages
3) Set start action to 'Specific Page' and value = "index.aspx?a=22"
A very simple way to make it work on both local and remote enviroments is to, at page_load(), detect if the desired QueryString content is present.
If not, Use Response.Redirect pointing to the current page with the added QueryString parameters. Example follows:
if (Request.QueryString["QSEntry"] == null)
Response.Redirect("Page.aspx?QSEntry=desiredValue");
Pro: It'll work the way you want.
Con: You're actually loading the page twice (first time it's a parameterless load), so don't forget to take that into consideration.

Need help with XPATH for src value of a specific Html img tag [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am somewhat new to XPATH and understand most of the basics, but I am having some trouble with a particular query.
I am attempting to parse a Motley Fool page and return the source of the image for the caps score of a stock.
For example: if you look at the source for the page: http://caps.fool.com/Ticker/SLT.aspx I want the source for http://g.foolcdn.com/art/ratings/stars/trans/5stars-trans-lg.png
I only want what follows the src= if possible.
I am currently working with:
xpath = "//div[#class='subtle marginT']"
This however is returning nothing. I know it might be asking a lot, but if you feel like answering, I would also greatly appreciate a quick reasoning for the answer as I want to learn XCAP, not just get this query to work.
Based on your URL this worked for me:
var imageNode = doc.DocumentNode.SelectSingleNode("//table[#id='tickerStats']/tbody/tr/td/img");
string imageText = imageNode.Attributes["src"].Value;
Basically just grabbing the closest element that has an id, then walking the tree down to where you want to be.
Alternatively this would work too and seems a little cleaner (since you don't really care about the DOM structure in the table itself as long as there is just one image):
var statsNode = doc.DocumentNode.SelectSingleNode("//table[#id='tickerStats']");
var imageNode = statsNode.SelectSingleNode(".//img");
string imageText = imageNode.Attributes["src"].Value;
Use:
//table[#id='tickerStats']/tbody/tr/td/img/#src
This selects any attribute named src of any element named img that is a child of a td that is a child of a tr that is a child of a tbody that is a child of any table in the document, that has an id attribute with value 'tickerStats'.
If you need just the string value of this attribute (assuming the above XPath expression selects a single attribute node), use:
string(//table[#id='tickerStats']/tbody/tr/td/img/#src)

Categories