I have a crawler application which should parse all items from a Page with paging. Unfortunately the web site that my application crawls uses postbacks for paging. How can I get contents of second page programmatically for following URL in C#.
http://www.hurriyetemlak.com/coldwell-banker-dikey-gayrimenkul/konut-satilik/istanbul-basaksehir-bahcesehir/emlak-ofisleri-ilanlari/3OWB4lkhYFs=/9wZEBZ-ivFgmrA3ENMCIfQ==/qh.BgsUoTK4=/GmMGgVD5Wcc=/GmMGgVD5Wcc=?sParam=3OWB4lkhYFs%3d&sType=9wZEBZ-ivFgmrA3ENMCIfQ%3d%3d&ListIsBig=qh.BgsUoTK4%3d&sortType=GmMGgVD5Wcc%3d&pageSize=GmMGgVD5Wcc%3d
I've tried posting __EVENTTARGET hidden field along __VIEWSTATE and __EVENTVALIDATION but it didn't seem to work.
You can achieve this using screen scraping techniques (See HtmlAgilityPack). This will require you to read the response and reissue form posts to mimic what the user would do in a browser. Simple request replays will not work.
You may also need to pass __EVENTARGUMENT hidden field. And don't forget to set name attribute, as well as id attribute.
Related
Hello guys I have an issue bugging me for the past few weeks.
What I'm trying to accomplish: I need a webbrowser control with the ability to change user agent (once at start) and referrer. But most important The ability to see the urls responses. What I mean by that for example if you navigate to a website you get back Images/Javascripts files/Dyanmic URLS in response I need access to those urls which some of them have dynamic variables (Regular Webbrowser Control will not show you those & you can't access it in any way beside using fiddler core).
I was able to do that with webbrowser + fiddlercore I can see and do what ever with those urls addresses. The problem was if you run few instances of this program (or sometimes once if the program has some automation to work with the url responses) It gets stuck or doesn't work. I tried fixing it and making it work but it's kind of a hacky solution that doesn't work right. I need a simple way to access those urls just as if you used httpwebrequest but as a webbrowser. Why I need it as a webbrowser? The way I work I need the execution of all the tracking pixels and scripts and images etc.. a normal webbrowser behaivor in httpwebrequest you can't just navigate and all the scripts will be execute as webbrowser, or can you?
Using the System.Windows.Forms.WebBrowser control in a WinForms app, set the webBrowser.URL property to the URL of the page you're interested in.
The webbrowser's DocumentCompleted event fires after the page has loaded. Any dynamically loaded JavaScript should be done by then. Hook the DocumentCompleted event and use the webbrowser.Document.Images to get a list of all image elements on the page. From those images you can get their SRC attributes which contains their URLs including any query parameters hanging off the end. You can use webbrowser.Document.Links to get a list of all hyperlinks on the page. For other HTML elements of interest, you can use GetElementsByTagName("foo") to fetch all elements with that tag name from the page, then dig into their attributes to pull out URL properties.
With webbrowser.Document you can get to any HTML element, whether it is statically or dynamically created.
What you can't get to through webbrower.Document is data that is loaded asynchronously using XMLHttpRequest(), because this data is not part of the browser Document Object Model. Web pages with scripted false buttons will be difficult to intercept.
However, if you know where the data is stored by the JavaScript executing on the page, you may be able to access it using webbrowser.Document.InvokeScript(). If the JavaScript on the page stores URLs in a mydata property of the window object, for example, you could try webbrowser.Document.InvokeScript("window.mydata") or some variation to retrieve the value of mydata into the C# app.
How to extract the extra content loaded in a web page, which will not be visible in view page source. The extra content is being loaded using ajax. This data can be seen under NET tab using firebug. How to extract this data using c# code.
Two ways :
1- You can use webbrowser to load the same page and get the active document.
2- You can replicate the ajax call made, and use that to get the extra bits that are appended to the document.
And reading your linkedin example above:
When you select the checkbox a ajax call is made , which brings back results and populates the table.You can see that call using firebug console window and see the post parameter and replicate them to get the same result.
Depends on your application in the first place, if you are using c# application as the client for reading a web page, then the the ajax content may not be visible until you put in a javascript engine.
if you are serving the said pages, you only have to log the request response of the server.
More specific question would be appreciated
That extra content is dynamically generated by ajax (for eg: Gridview is generated as table), it is stored in browser's memory. and can be viewed by client side debugging tools (IE has developer tools option).
Once you do a post back, all the control's values are available for C#.
If you are saying extra content, can you please clarify what exactly you are trying to extract using c#?
I have seen this on some survey websites. What is the C# code they use on the client side to keep the URL same, but when clicking the "Next" button, the same aspx page is maintained
without having any query string;
without any change even a character in the url; and
the grid, the data , the content, the questions keep changing?
Can anyone give a code-wise example how to achieve this?
My main query is how is this done in code-behind to change data of page and maintain same url.
Nothing simpler that a session, maintainted at the server side. Store a "current question number" in session, increment it at each succesfull postback and you have what you ask about.
Another possibility - a cookie which contains "current question number".
Both cookie and session are invisible in the query string of course.
"change data of page and maintain same url." Answer is Server.Transfer.
This method will preserve url.
The Next button may submit a form using the HTTP POST method. The form data may contain the session, question and response data. The site uses that to build a new response. Unlike a GET, a POST does not incorporate data into the URL.
Developers will typically accomplish this task by using AJAX. The basic premise behind it is that only a certain portion of the page (e.g. a grid or content area) will make a server call and retrieve the results (using Javascript). The effect achieved is that there has not been a full post back, which is why you don't see the URL or parameters changing.
It is possible to do this using jQuery, pure Javascript, or Microsoft's UpdatePanel.
oleksii's comment has some good links as well:
That's the AJAX magic. There are many JQuery plugings for this, for
example this one with a live demo. You can also program it easily
using JQuery Get or Post or any other wrapper that use XmlHttpRequest
object.
I have a bunch of parameters that I need to pass onto a second page via request headers. At first I tried via JS but I found out that that's impossible (please correct me if I'm wrong here).
So now I'm trying to do it in the code-behind (via C#). I want to write a bunch of custom request headers and call Response.Redirect or something similar to redirect user to the new page.
Is this possible? If so what methods do I have to use?
Edit: unfortunately using QS parameters is not an option here as it's out of my control.
Use a Server.Transfer("somepage.aspx?parameter1=value");
There is no client redirect then.
You can try setting the headers and do a Server.Transfer - I believe that will work to - up to you, but using the querystring is a bit more readable to me and doesn't show up in the clients browser.
you need to look at state in .net their are various ways to achive state.. in a stateless environment.
i would put it in the session object on page one.. read it on page 2...
create a session object on code behind page 1
read from session object on page 2.
or if you read the msdn state documenation on request paramters this will show you the options avliable.
JS dont worry about doing tricky stuff with it.. mostly trickey is wrong.
On a button click event I am required to POST to a page on an external website and redirect there. I get how to do this using a GET method
Reponse.Redirect("www.somesite.com?my=params&as=aget")
But how can I do this on as POST?
I don't want to post the entire form as this button event is called within a repeater
Depends.
If you want to post the exact input of a form you have on your site (that is, you just replicate a form the other site has), then just set the form's action to the URL you want to post to and the browser will do everything for you.
If however you want to post some values you generate on the server (perhaps based on the input from your form), I'm afraid it's not possible. You can't redirect using a POST. Redirect is GET by it's nature.
BUT you might be able to fake it by doing a POST (using something like System.Net.WebClient) and then a redirect (it depends on how the other site handles the GET - it might display the same thing that it did on the POST, or not).
One more option (for the second case) would be to to do an AJAX call to your server, which will compute the required values, then do the POST to the other server from Javascript.
You can build up the request using WebClient, adding the appropriate headers.
My inner forms don't contain the runat="server" attribute so I can do what I want. I do get this problem though ASP.Net First inner form in Server Form doesn’t POST.
Jquery is life saving in this situation. Used for one of my project and works like a charm. Give it a try : Peter Finch - Using Javascript to POST data between pages