I'm trying to retrieve html content from some webpage. The problem I'm having is that in the webpage, there's a javascript function that generates additional html elements that I'm not getting (because the content I'm downloading is not rendered).
Is there anyway to download the html, render the javascript and only afterwards - get the html content?
Check out this post (Get the final generated html source using c# or vb.net), which mentions using WebKit.NET (http://sourceforge.net/projects/webkitdotnet/ ) or Open Web Kit Sharp https://code.google.com/p/open-webkit-sharp/
I have used the extension "Web Developer" for Firefox to get the generated HTML. Open the webpage then click View Source button then View Generated source. That will open the source with all the generated HTML. You can then copy it and do what you wish.
http://chrispederick.com/work/web-developer/
Related
The purpose of this is working on ASP.NET webform application. My requirement is to get user control rendered html and then convert that html to pdf and attach as a document in database.
I have used multiple techniques like getting html from http web request but could not found my desired solution. Any help would be really appreciated.
You could look at the OpenHtmlToPdf
https://github.com/vilppu/OpenHtmlToPdf
I think it will do what you need.
You will create the PDF and save it as a blob within your database is my understanding of your requirement.
now a days there are web pages which developed using some ajax based frameworks (dynamically or lazy loading). Just wondering if there is any way to download html contents of such pages as when i try to download using htmlAgilityPack but all i get is header and empty body part but when i try to inspect element then only i can see proper htmls/div but of that page when i try to look into view source i see empty body...
is there any third party like htmlAgilityPack or any other way?
You would need to be able to run the js that is inside. Which according to this answer is not possible with htmlAgilityPack.
You can see it.Getting web content by Html Agility Pack.https://code.msdn.microsoft.com/Getting-web-content-by-bb07d17d...
I am developing desktop application. I would like to grab remote html source. But remote page widely rendered by javascript after page load.
I'm searching for a few days but i could not find anything useful. I've studied and tried to apply the following suggestions. But I can get only base html codes.
WebBrowser Threads don't seem to be closing
Get the final generated html source using c# or vb.net
View Generated Source (After AJAX/JavaScript) in C#
Is there any way to get all data like html console's approach of firebug?
Thank you in advance.
What are you trying to do? A web browser will do more than just grab the HTML. It needs to be parsed (which will likely download further files) and rendered. You could use the WebKit C# wrapper [http://webkitdotnet.sourceforge.net/] - I have used this previously to get thumbnails of web pages.
Hello!
How can i open dynamic generated html page in IE with it API or COM?
Problem is that it page haven't real link address...
I cannot save file to disk or to clipboard on assignment.
Thanks!
Take a look at this link for an example of constructing an html page from scratch in javascript. You ought to be able to do something similar, inserting all the html that you already seem to have (previously generated), as the html of the new document.
Is there a way to write PDF to a div from DataBase i.e. Retrieve a Byte[] from Database and Reponse.BinaryWrite to a div.
We do similar thing for Images using src = "anotherpage.aspx" where image is written on anotherpage.
Is it possible with PDF without using IFrame?
If what you're trying to do is show a PDF file inside a DIV, you're going down the wrong path. You either need to:
Convert the PDF to Flash (ala Flash Paper)
or
Convert the PDF to HTML (like Scribd does using HTML 5).
Then you can embed the PDF inside a DIV. But no browser I know of supports directly embedding PDFs.
Otherwise you have to put the PDF in an IFRAME, but how this is shown is PDF plug-in dependent.
No. The reason it works with a src=otherpage.aspx request is that the src attribute results in the user's web browser making a completely separate request for the other resource. You're serving up an additional page to make that happen. Writing a PDF file directly is trying to inject the PDF into the same request as your page - not really "similar" to your img src at all. In fact, what is most similar to the "src=otherpage.aspx" method is the iframe approach that you mentioned.
As a side note, you our "AnotherPage.aspx" example should really be changed to "AnotherPage.ashx". Note the letter 'h' in there. That means you're using a handler rather than a page, which will perform better.