How to screen-scrape 3 levels deep in C#

How to screen-scrape 3 levels deep in C# - c#

I have a url (http://www2.anac.gov.br/aeronaves/cons_rab.asp) where I need to post form data programatically. That is, programatically, I want to select the correct radio button and click the submit button. If you go to the url above, the radio button I need selected is "modelo." Clicking the "ok" button will bring back a form with 20k+ links on it.
I then want to traverse all 20k+ links and scrape the page that the links point to. Finally, I will take the information from the last page and put the data in an Excel spreadsheet.
What would be the best way to get to the third page to scrape the information? I've researched the HTML Agility Pack, HTTPWebRequest and the WebBrowser control, but I'm not sure which one to use.
UPDATE: On the first page, I must select a radio button and then simulate a button click that posts the form back to itself. The resulting page contains the 20K+ links I'm interested in; however, each link is a javascript function call. The JS function takes the link text, places it in a textbox and then clicks the submit button. How the hell do I automate that?

You should be able to do what you want with the HTML Agility pack:
http://htmlagilitypack.codeplex.com/
http://www.leeholmes.com/blog/2010/03/05/html-agility-pack-rocks-your-screen-scraping-world/
You should also consider iRobot:
http://irobotsoft.com/
ALSO:
1) What have you tried?
2) How far did you get? What problems/questions did you encounter?

Have you tried Selenium? It uses webdriver and I've done several screen scraping apps using it and have never had issues, even with realtime apps. You can use it with C# to drive a browser and grab what you need.

Related

How to open a link, fill data then click a button with no browser?

What I need is to open a web page with a link fill some textbox with my data, click in any button and then read the data of the page in C#.
For example :
Open (www.google.es) then fill the search box with "stackOverFlow", click on the search button and then read the results.
I've been looking and I think I can read the data with HttpClient but I have no clue about how to proceed with the other part.
Edit: Actually using a .net Framework console aplicattion but I can change this to an MVC app or a Winform app

You should be able to type input by calling script:
document.getElementById("Input").value = "My value";
and then post it by calling script from question below or simulate button clink (if you are not sure that button do post)
JavaScript post request like a form submit
Simulating Button click in javascript

If you are trying to create a custom search and show result in your own application, you will need to find the implementation of rest API of that search provider. To get started, luckily google provides that.
follow the link to get started:
https://developers.google.com/custom-search/v1/overview

How to Auto test the Javascript Webpage

I tired to make a auto test program which can auto fill/click another webpage based application.
Want my software can eumerate the manually action, ex. click the items, fill the text box...
Now, I can load the page in C# WPF's webbrowser, but didn't know how to auto fill the javascript popup window. we cann't get the elements through funcs like GetElementId(); becasue the page seems written by javascripts.
I'm really a newer on C# and Web.Welcome everyone give comments. Many thanks!
This is the webpage I want to test:
This is the source html I got via IE->View->Source:

clicking submit button on aspx page programmatically

I have a list of Ids whose information available through a webpage. I need to input the Id and click the submit button for that. The numbers of ids runs over a million, so I need to automate this process. Is it possible to use Web scraping. I don't know where to start this.

I'd recommend using fiddler to capture a single request you do manually and use this as your template. Then you'll have to add some code to manage the session and other cookies.

WebBrowser Display partial website

I'm displaying a website in a C# WebBrowser. But I would like to display only the search part not the whole website so it won't bo so big on the screen. This is the website http://www.buscacep.correios.com.br/ and I would like to display only the Busca CEP - Endereço Box. Any ideas of how I can do this? I tryed to use htmlagilitypack but it has very little documentation and I couldn't understand it.

The WebBrowser control isn't really designed for what you're asking. You probably could go through all the page elements and remove anything that isn't part of the search box, but that's a lot of work for very little value.
However, there's a bright side. As mentioned in a comment, you should be able to POST directly to the search page. Use a program like Fiddler to find out what form values are being passed to the server with the request. Then you can re-create that request from your own application (using a WebClient or HttpClient). The result will be HTML, which you can display in your WebBrowser by setting the returned HTML to the WebBrowser's DocumentText property.

How to get the HTML element in an new windows triggered by javascript?

i'm using WebBrowser control in Windows Forms App to automate a website and do some actions, so i need to know the HTML elements in the website(for example, textbox id, button id).
Everything is running smooth until i meet one situation. There are a html link element which point back to itself (eg: "http://www.aaa.com") but trigger a new windows with different url (eg: "http://www.bbb.com"). Below is the html link element :
< a href="#" class="toolbar" id="Export_Link" onclick="showExportWindow();" title="Export me">Exports & Reports< /a>
It showed up a new window with different url and therefore, the WebBrowser control unable to get the HTML element in the new window because it trace back the html element in the old window ("http://www.aaa.com#") and not the new window ("http://www.bbb.com")
Please help me! I'm stucking here for 1 weeks already! Anyone know how to solve this problem?

I'm not sure if i understand your problem correctly, but it seems that you are struggling with the "#" link. It's a common practice to link to the external page, even if you want to use javascript to open this page. This provides a good fallback for browsers with disabled javascript support.
Other
Pay regard to the return false; at the end of the javascript, it's necessary to tell the browser that the "normal" page link shouldn't be followed when the javascript runs.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to screen-scrape 3 levels deep in C# - c#

Have you tried Selenium? It uses webdriver and I've done several screen scraping apps using it and have never had issues, even with realtime apps. You can use it with C# to drive a browser and grab what you need.

Related

How to open a link, fill data then click a button with no browser?

How to Auto test the Javascript Webpage

clicking submit button on aspx page programmatically

WebBrowser Display partial website

How to get the HTML element in an new windows triggered by javascript?

Categories

Resources