now a days there are web pages which developed using some ajax based frameworks (dynamically or lazy loading). Just wondering if there is any way to download html contents of such pages as when i try to download using htmlAgilityPack but all i get is header and empty body part but when i try to inspect element then only i can see proper htmls/div but of that page when i try to look into view source i see empty body...
is there any third party like htmlAgilityPack or any other way?
You would need to be able to run the js that is inside. Which according to this answer is not possible with htmlAgilityPack.
You can see it.Getting web content by Html Agility Pack.https://code.msdn.microsoft.com/Getting-web-content-by-bb07d17d...
Related
I want to capture some blog from some blog sites. I know to use HttpClient to get the html string, and then use Html Agility Pack to capture the content under the specific html tag. But if you use WebView to show this html string, you will find that it's not good in mobile. For example, css style will not be loaded correctly. Some code-blocks will not auto wrap. Some pictures will not show (It will show x).
Some advertisements also will show, but I don't want it.
Do anyone know how to get it? Any suggestions will be apprieciate.
Try running the html string through something like Google Mobilizer. This should make a more mobile friendly html string which you can then use the Agility pack to 'unpack'
Ideally you should capture the HTML page and all its associated resources: CSS files, images, scripts, ...
And then updates the HTML content so that resources are retrieved from your local data storage (for example, relative URL will not work anymore if you saved the HTML page locally).
You may also send your HTTP Request with a User-Agent header that corresponds to the one used by Microsoft browser in order to obtain the corresponding version from the website (if they do some kind of User-Agent sniffing).
Hello i want to ask something ... Is there a way to read some information from website that i do not own from a code behind
Like i want to read title of every page in some web site ... Can i do it and how ?
Not a way of hacking just to read the clear text no html code want to read
I don't know what to do or how to do it i need an ideas
And is there a way to search for specific word in several website and an api to use it for search for a website
You still have to read the HTML since that's how the title is transmitted.
Use the HttpWebRequest class to make a request to the web server and the HttpWebResponse to get the response back and the GetResponseStream() method to the response. Then you need to parse it in some way.
Look at the HTMLAgilityPack in order to parse the HTML. You can use this to get the title element out of the HTML and read it. You can then get all the anchor elements within the page and determine which ones you want to visit next that are on their site to scan the titles.
There is powerful HTML parser available for .Net that you can use with XPATH to read HTML pages,
HTML Agility pack
Or
you can use built in WebClient class to get data from page as string and then do string manipulation.
I am developing desktop application. I would like to grab remote html source. But remote page widely rendered by javascript after page load.
I'm searching for a few days but i could not find anything useful. I've studied and tried to apply the following suggestions. But I can get only base html codes.
WebBrowser Threads don't seem to be closing
Get the final generated html source using c# or vb.net
View Generated Source (After AJAX/JavaScript) in C#
Is there any way to get all data like html console's approach of firebug?
Thank you in advance.
What are you trying to do? A web browser will do more than just grab the HTML. It needs to be parsed (which will likely download further files) and rendered. You could use the WebKit C# wrapper [http://webkitdotnet.sourceforge.net/] - I have used this previously to get thumbnails of web pages.
I'm trying to retrieve html content from some webpage. The problem I'm having is that in the webpage, there's a javascript function that generates additional html elements that I'm not getting (because the content I'm downloading is not rendered).
Is there anyway to download the html, render the javascript and only afterwards - get the html content?
Check out this post (Get the final generated html source using c# or vb.net), which mentions using WebKit.NET (http://sourceforge.net/projects/webkitdotnet/ ) or Open Web Kit Sharp https://code.google.com/p/open-webkit-sharp/
I have used the extension "Web Developer" for Firefox to get the generated HTML. Open the webpage then click View Source button then View Generated source. That will open the source with all the generated HTML. You can then copy it and do what you wish.
http://chrispederick.com/work/web-developer/
Hello!
How can i open dynamic generated html page in IE with it API or COM?
Problem is that it page haven't real link address...
I cannot save file to disk or to clipboard on assignment.
Thanks!
Take a look at this link for an example of constructing an html page from scratch in javascript. You ought to be able to do something similar, inserting all the html that you already seem to have (previously generated), as the html of the new document.