I have a list of Ids whose information available through a webpage. I need to input the Id and click the submit button for that. The numbers of ids runs over a million, so I need to automate this process. Is it possible to use Web scraping. I don't know where to start this.
I'd recommend using fiddler to capture a single request you do manually and use this as your template. Then you'll have to add some code to manage the session and other cookies.
Related
I have a grid view and an edit button in the grid view. On edit button click I am opening a new aspx page that has text fields for input the data. When a user copies the URL of the gridview and opens it in a new tab of any browser then click on the edit button for two different records. If the user changes anything in the first tab and submits it. It changes the info for the record on the second tab. It is happening because I am passing userid in session to the form aspx page and session got updated when user opens the second record in the new tab.
Are there only two ways to passing data to aspx page?
using session
using a query string
I don't want to use the query string.
Please help thank you.
You are writing a ASP.Net application, so at the end of the day there is only that much you can do. You can request some things off the browser, but if he actually does it is entirely up to it.
You can make it unlikely to happen by accident, using the HTML Links target property. This requests the browser to re-use any alread open tabs for this record. But that will not prevent a dedicated person from still opening 2 copies.
A pretty simple way to avoid race conditions in general, is the SQL rowversion column. You retreive the rowversion with the rest. You keep it along in a hidden formular field (that is what they are there for). When writing the update, check if it still matches before the write. If yes, you update. If not, somebody has modified the record since then and you reject the update. Can be the same user in another tab, can be another user at the end of the world. Could be that this tab was opened a year ago, surviving on sleep mode. It does not mater - any change trips this protection.
I have a webforms site that has 2 menus.
On a page you click a button, has some c# events fired by a webservice (ajax) then redirects you to another page with history.go(-1). The only problem is that in the webservice I create a sesion that makes the menus switch, the default one hides and the other one shows. The menu switch in done in the Page_Load of the Master page.
My problem is that with history.go(-1) you get to the previous page, but the old menu is present instead of the new one. How can I fix it?
the problem is that the browser is not actually loading the previous page it is using the cached page. is there a reason you can not have both menus hidden and then decide what one to show client side? this way you can let the JS .ready take care of what menu to show and then you should get the desired results when using the history.go(-1).
This artical speakes to setting cookie from the server then checking in on the client.
you could use something like this and then check the cookie to determine if the page was loaded from cache and then force a postback.
location.reload()
My fix was to add in a session the previos link, and when I need a redirect w/ cache I redirect to another aspx page, that redirects depengind on the url params where I need it to go... it was the simpler method I could thing of...
I have a url (http://www2.anac.gov.br/aeronaves/cons_rab.asp) where I need to post form data programatically. That is, programatically, I want to select the correct radio button and click the submit button. If you go to the url above, the radio button I need selected is "modelo." Clicking the "ok" button will bring back a form with 20k+ links on it.
I then want to traverse all 20k+ links and scrape the page that the links point to. Finally, I will take the information from the last page and put the data in an Excel spreadsheet.
What would be the best way to get to the third page to scrape the information? I've researched the HTML Agility Pack, HTTPWebRequest and the WebBrowser control, but I'm not sure which one to use.
UPDATE: On the first page, I must select a radio button and then simulate a button click that posts the form back to itself. The resulting page contains the 20K+ links I'm interested in; however, each link is a javascript function call. The JS function takes the link text, places it in a textbox and then clicks the submit button. How the hell do I automate that?
You should be able to do what you want with the HTML Agility pack:
http://htmlagilitypack.codeplex.com/
http://www.leeholmes.com/blog/2010/03/05/html-agility-pack-rocks-your-screen-scraping-world/
You should also consider iRobot:
http://irobotsoft.com/
ALSO:
1) What have you tried?
2) How far did you get? What problems/questions did you encounter?
Have you tried Selenium? It uses webdriver and I've done several screen scraping apps using it and have never had issues, even with realtime apps. You can use it with C# to drive a browser and grab what you need.
i want to design a form in asp.net in Wizard style. do something in click next.
the form have 3 step
fill your information
add element [here if you type something wrong then you can edit or delete them before going to next step]
finish
what is the best practise to design this in ASP.NET MVC with a power of ajax.
are anyone show me the best way i can use to do this in MVC
Here's how you could proceed: the elements of each step of the wizard could go into a separate div. The Next and Previous buttons will show/hide the corresponding div. On the last step there will be a submit button which would send the entire form to the server.
You might also take a look at the jquery form wizard plugin.
One of the ways that I have implemented a wizard is to have a separate database table that contains all of the information you are required to store and to save/retrieve data to that table in each step of your wizard - obviously depending on the size and purpose of the wizard this may not be sensible with the number of database calls but I was implementing only a 5 page wizard with maximum 5-10 fields on each page. So when you land on a page you query the database and retrieve the information from the database or if it doesn't exist load a blank page where the user can then enter the information and it is saved when they click either Next or Previous.
For navigating between pages I simply built a helper class that accepted the page name and button type (Next/Previous) and had a simple switch statement which would return the page to navigate to and then used that in a RedirectToAction statement. Again this may not suit a larger application but you could also look at using Windows Workflow (touched on in this article http://www.devx.com/dotnet/Article/29992) as I know that it can be used to create wizard style applications.
It is not particularly an MVC solution but I advise a client-side implementation using JQuery LightBox.
You don't need any client side stuff to achieve this, it's also bad practise to use javascript for anything other than user convenience.
You have 2 problems with a wisard:
1: maintaining state. ie saving data between requests.
2: figuring out which action (usually next or previous) to take.
Maintaining state.
You can use the session object but ideally (and so you can unit test them) all actions should be pure functions. I use hidden inputs to save data between requests.
User actions.
For a next / previous view. Add 2 submit buttons to your form and give them names. When you
POST the form, the button with the none null value was the button pressed. Then redirect to the appropriate action.
How can I set an ASP.Net web page to expire so that if the user clicks the submit button, he/she will get a page expired error if the browser's back button is pushed to try to go back and press submit again?
Use HttpResponse.Cache to control the cacheability of the page. This gives you control over options such the expiration of the page from the cache and the Cache-Control HTTP headers.
First off, use the Post-Redirect-Get pattern when the user submits the form. This will prevent them from being able to use the back button easily. To do this, all you really need to do is issue a Response.Redirect() call after you finish processing the form, even if it's to the same URL.
Secondly, you could consider using a unique id field in the form that is tied to the submission process, so that if the submission is completed, the same id cannot be used again. The suitability of this would depend on what you're doing though.
From what I understand, there are two parts to your question:
1 - Stopping the browser back button - it does not work & me-thinks that we should never stop the user from pressing back. So, perhaps you could use META tag to expire the content so that the user see a "content expired" page & has to reload to get the latest content
2 - Stop multiple POST - by definition, POST is not indempotent ie. multiple POST operations should be possible.
A possible mechanism is to disable the POST/SUBMIT button after the first post has completed. So, the user will not be able to do it the second time.
HTH.