Getting rendered html final source - c#

I am developing desktop application. I would like to grab remote html source. But remote page widely rendered by javascript after page load.
I'm searching for a few days but i could not find anything useful. I've studied and tried to apply the following suggestions. But I can get only base html codes.
WebBrowser Threads don't seem to be closing
Get the final generated html source using c# or vb.net
View Generated Source (After AJAX/JavaScript) in C#
Is there any way to get all data like html console's approach of firebug?
Thank you in advance.

What are you trying to do? A web browser will do more than just grab the HTML. It needs to be parsed (which will likely download further files) and rendered. You could use the WebKit C# wrapper [http://webkitdotnet.sourceforge.net/] - I have used this previously to get thumbnails of web pages.

Related

I need to create PDF from user control HTML in ASP.NET webforms ? How do i get html from user control and then convert that HTML to PDF?

The purpose of this is working on ASP.NET webform application. My requirement is to get user control rendered html and then convert that html to pdf and attach as a document in database.
I have used multiple techniques like getting html from http web request but could not found my desired solution. Any help would be really appreciated.
You could look at the OpenHtmlToPdf
https://github.com/vilppu/OpenHtmlToPdf
I think it will do what you need.
You will create the PDF and save it as a blob within your database is my understanding of your requirement.

C# How to print a web page, without using javascript window.print() method?

I want to print the web page in asp.net application without using javascript function window.print().
Can anyone tell me a short c# code to achieve my goal?
I do not want to print a document either, my requirement is to print what ever I have on the webpage. The print should be similar to what javascript Window.print() does.
I hope I understand your question, it sounds like you want to print a web page (HTML) using C# code.
I've used wkhtmltopdf before (http://wkhtmltopdf.org/) and on the same site, there's wkhtmltoimage. The utility runs as a command line tool, so it's easy to use in C#. After you have the image or PDF file (I'd opt for the image version), it should be easy to print out.
Your biggest concern is rendering the HTML into a page, that is what wkhtmltopdf/image does for you.
You're asking for a lot there. Printing is something the client (the web browser) has to do. The window.print() function tells the browser to print the page, and if you can't use javascript, then you have to '' the page on the server. If you want to print the page on the server side, then you have to take the HTML and convert it to some printable format like PDFs or rendered pictures.
Try out selenium webdriver. You can get a headless browser like PhantomJS (running on the server) to 'screen capture' the page to a PNG. Then you can send that to the browser. Unfortunately, selenium web driver does not support PDF 'screenshots'.
There's also a nuget package for selenium if you're into that sort of thing.

Retrieving Dynamically Loaded Data From a Website

I am attempting to retrieve data that is dynamically loaded onto a webpage using hashed links i.e. http://www.westfield.com.au/au/retailers#page=5
My question is what technology is being used to load the data onto the page?
Secondly, how would one approach retrieving this data using C#?
My attempts so far have used WebClient to download the page at this link, unfortunately the html file only contains the data from the very first page, no matter what page link i use.
What technology is being used to load the data onto the page?
JavaScript is used to load the data from a server, parse it into HTML and put it in the right place in the DOM.
Secondly, how would one approach retrieving this data using C#?
Make a request to: http://www.westfield.com.au/api/v1/countries/au/retail-chains/search.json?page=5, it will return a structured JSON document containing the data you need.
If all you need is the JSON structure, Jon's answer sounds like a good place to start.
If you want a good stack for true rendered scraping I'd use a combination of phantomjs and Selenium to help bridge it to .net.
This article is a great place to start.

How I can open dynamic created html page in IE without saving (C# + API or COM)

Hello!
How can i open dynamic generated html page in IE with it API or COM?
Problem is that it page haven't real link address...
I cannot save file to disk or to clipboard on assignment.
Thanks!
Take a look at this link for an example of constructing an html page from scratch in javascript. You ought to be able to do something similar, inserting all the html that you already seem to have (previously generated), as the html of the new document.

Generate PDF in ASP.NET from Fully Rendered Page

Does anyone know of a component (open source or 3rd party) that would allow you to export a fully rendered HTML page to PDF in c#? We have a page that has its DOM modified with jquery but the methods we have tried (ABCpdf.NET, WebClient, etc) don't register any DOM changes from javascript in the PDF. We need to programmatically export that rendered HTML (post-jquery) to PDF on the fly.
ExpertPDF HtmlToPdf Converter v7.0
I was looking for something similar many months ago and as far as I can remember, it's not possible with any free third-party controls. There are paid ones available. The closest you can get is iTextSharp. It will allow you to export the contents of specific html tads and user controls but it's a bit of a pain to deal with
I'm never tried is but there's an open source solution called wkhtmltopdf that renders a PDF from HTML/JavaScript/CSS using the WebKit engine. This post talks a little bit about using it. If it works I'd like to know because I've heard this request a couple of times here.

Categories