I am attempting to retrieve data that is dynamically loaded onto a webpage using hashed links i.e. http://www.westfield.com.au/au/retailers#page=5
My question is what technology is being used to load the data onto the page?
Secondly, how would one approach retrieving this data using C#?
My attempts so far have used WebClient to download the page at this link, unfortunately the html file only contains the data from the very first page, no matter what page link i use.
What technology is being used to load the data onto the page?
JavaScript is used to load the data from a server, parse it into HTML and put it in the right place in the DOM.
Secondly, how would one approach retrieving this data using C#?
Make a request to: http://www.westfield.com.au/api/v1/countries/au/retail-chains/search.json?page=5, it will return a structured JSON document containing the data you need.
If all you need is the JSON structure, Jon's answer sounds like a good place to start.
If you want a good stack for true rendered scraping I'd use a combination of phantomjs and Selenium to help bridge it to .net.
This article is a great place to start.
Related
I am retrieving a binary image from SQL Server and would like to pass this binary image from a GridView control to another page and display by using an Image control. What is the best way to do this?
I have tried to use Http.Utility (encode & decode) and it said URL is too long.
My suggestion is as follows:
1. From the details, it seems that your image is stored as a blob(binary data) in database table.
2. Instead of fetching the blob(binarydata) and passing it to the next page, you can just pass the database ID to the next page.
3. On the next page, make an AJAX request to the .NET server side code (you can put this on page load) and fetch the blob based on the database ID received from the previous page.
4. Then display this image and load it in the HTML div or img tag using the javascript success callback of the ajax request.
By the way, it is recommended to always store images on a filesystem. If you can refactor your server side code to store the images in a directory structure - that would be ideal. It may take lot of changes to your code base. If the images are not too big - you may continue with the existing approach with the solution I suggested. Generally, having large binary data in database tables - could in long term make the queries slower.
You may want to check how to set binary data fetched as an image and set it in html. Please check this link for details: Is it possible to put binary image data into html markup and then get the image displayed as usual in any browser?
I am developing desktop application. I would like to grab remote html source. But remote page widely rendered by javascript after page load.
I'm searching for a few days but i could not find anything useful. I've studied and tried to apply the following suggestions. But I can get only base html codes.
WebBrowser Threads don't seem to be closing
Get the final generated html source using c# or vb.net
View Generated Source (After AJAX/JavaScript) in C#
Is there any way to get all data like html console's approach of firebug?
Thank you in advance.
What are you trying to do? A web browser will do more than just grab the HTML. It needs to be parsed (which will likely download further files) and rendered. You could use the WebKit C# wrapper [http://webkitdotnet.sourceforge.net/] - I have used this previously to get thumbnails of web pages.
I am building an ebook manager app for the Windows store using Windows 8.1 and Visual Studio 2013 preview. I have a new webview control that is able to resolve uri's and load the HTML and CSS.
However there is a lot of data in one HTML file and I would like to paginate it someway. My Questions are:
Is there a way to do this with the stream in C#?
Are there any examples out there on paginating HTML content?
Is there a way to measure programmatically how much screen real estate will be used by a particular piece of HTML?
It kind of depends on the type of data that is send back to the browser, also how you want to present it afterwards.
Perhaps you can show some sample data which you want to paginate
I have an HTML "printer friendly version" type page that I'd like to convert to a document-style file type so that I don't have to worry about disabling links on the page and stuff. Is there a fairly simple way to create a file like that from the page's html without using third party libraries?
At first glance this is going to probably look like a duplicate of a bunch of other questions, but most of the answers involve using third party software, which isn't an option for me.
If you're talking about an asp.net site, you don't have the HTML that is given to the browser; you'll have to do whatever you want to do to generate a PDF and send that back to the browser. You can't let ASP.NET send HTML back to the browser and expect the user to see PDF...
I just did it with CSS. Sorry I have to accept this as an answer, but the only answer posted wasn't really what I was looking for..
I'm looking for a way to save a table from an html page as xml or json. The current method i'm using save the entire page as and xls sheet and then reads the sheet using Office.InterLop.Excel. I want to skip saving the file and just read directly from the page using HttpRequest. Any ideas?
I assume you mean that you'd like to scrape the contents of a web page without File-> Save As?
Code project has a writeup explaining using HttpWebRequest to do just that. Or, you could use the newer HttpClient. Once you retrieve the HTML, you'll have to parse it yourself.
In the MSDN artticle, they're actually requesting JSON directly, so they don't have to deal with parsing, but you could very easily write up a RegularExpression to capture the table body.