How to download infinitely-scrolling web page - c#

With WebClient.DownloadString method it's fairly simple to load normal web page source to string.
But is there any easy way to load those pages which extends and loads new content when you scroll down to end?

You cannot "download" such a page, as it doesn't exist in full form. Such pages require user interaction.
You can use one of the forms of the WebBrowser control to browse to, and programmatically interact with a web site.

hey you can try this approach if you want to do it webclient..
See here.. basically he is using the scrapy but this approach can be adopted in case of webclient to i think so.
basically he is using the firebug or chrome developer tool in order to trace the ajax web request after knowing the web request you can get the content with webclient.

Related

URL Not loading in a System.Windows.Forms.WebBrowser

I'm using a System.Windows.Forms.WebBrowser control to load a webpage. My understanding is it is just a wrapper around Internet Explorer. I'm therefore struggling to understand why the web page loads ok in a desktop browser (IE 11) but not in the browser control.
The URL has looks as so:
http://subdomain.domain.co.uk/?argumentName=argumentValue
My question is, firstly, is that a valid url (format wise)? Can you have a '?' directly after the '/' as shown? My theory is this could be getting removed behind the scenes when using a desktop browser and not happening when using the browser control. Unfortunately I cannot test that theory as the web page is behind a firewall. At the moment I know nothing else about the location of the web page other than the url.
If that is a valid url, could anyone suggest reasons it would load in IE but not in the web browswer control?
To expand on how its not working, the request times out after 30 seconds. The NavigateError event is raised (status code for me is -2146697211, but that could well be different to status code they get. I cannot find that out until I deploy a dll with some logging info).
The webpage then displays as

Hide URL in ASP.NET application

I am developing an ASP.NET application. But i would like to hide the URL so user don't know on which page he or she standing. Is their any solution?
Use Server.Transfer . It doesn't change the URL.
Server.Transfer happens without the browser knowing anything, the browser request a page, but the server returns the content of another.
Server.Transfer() should be used when:
we don't need to show the real URL where we redirected the request
in the users Web Browser
we want to transfer current page request to another .aspx page on the
same server
we want to preserve server resources and avoid the unnecessary
roundtrips to the server
we want to preserve Query String and Form Variables (optionally)
There is no solution unless you can force the user to browse only from a restricted environment in which you can control what software is installed or run. Even if you force the user to use a specific browser, they could use a tool like Fiddler to see what URLs they are going to.

filling web form using a windows application in C#

I have a site which get information of user in one of its pages.
every user has a card which contain his information. I want to write a windows application in Visual C# which read the card and fill web form using those data.
for this reason I have to run a browser in my windows application and run some javascript code to fill that elements in that browser.
does any one how can I run a browser and give to that specific javascript (in url after on page has been loaded) to fill the form?
There is a WebBrowser control that you can use in your windows app. As for populating the information on a web page, I would just pass the userID in QueryString to the URL of the page you create (in your WebBrowser control), and in the page add code to retrieve the user information and display it.
Here is some info on the WebBrowser control:
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.aspx
My guess is you would be better off handing over the data you want to see in the web forms via http post or get. Then let the serverside write the values into the propper forms.

c# Auto login to sprint

I have a project where I am trying to login to sprint and then do some screen scraping to get data about the different lines that the company controls. I have tried passing the cookies that are provided by the initial website call in the initial HttpWebRequest form post, but I do not get any cookies back that will denote user or session or anything. In fact, if I then try to use the WebClient class to get the landing page, the response url that I get back is the login page.
I think it is due to the fact that when you login, you get redirected to a page that does some processing and then redirects you to the landing page. I am passing in correct credentials and don't know where it is failing. Can anyone help me so that I do not need to use Watin or any other browser control to scrape that data as that will be too slow.
Use Selenium.
It is normally for website testing, but you easily use it for your situation.
It allows you to launch a browser and programmatically control mouse clicks and keyboard presses to do exactly what you need.
You also run xpath on the HTML to read data, or even run custom javascript on pages if you need to get more complicated.

Web Scraper via Web Service API?

How would I go about doing the following...
I want to build a web service for my application to grab a piece of data from an external website, that requires the user to login. The website has no public API , hence the reason for the scraper.
Is there a library to perform the following functions? or what do I do?
automate fill-in form, auto click
Automate submit button
check which URL the user has landed
on, and redirect user to URL
Grab data from label.
EDIT: what im asking for is there a web service, library etc to make it easier to perform screen scraping/automation functions???
Instead of filling a form and virtually clicking buttons, you should look at the source of the form, and figure out how the data is being submitted. In most cases you can simply send a post request with the log in data. If there is something special besides a simple post request, I use this addon to figure out what requests are being done that you can't see. Using C#, I would use the HttpWebRequest class because it handles cookies for you.
If the website does not ban robots, you can use YQL to simulate everything you need. However, it can be a bit difficult or impossible as you basically have to implement a text-only browser within JS.

Categories