I'm working on a web scraping application and was testing it with ebay. The thing is that the application should follow the link "Next" (the one at the bottom of the page that should go to the next page of results) but it kinda stays on the same page (yea, i'm actually not sure about that). If you try to open ebay and search for any term that will give a result with multiple pages, and then either copy the link of "Next" and paste it on a new window or right click it and select open in a new tab/window, it will stay on the same page. I tested it on Chrome and IE8. So my question is what are these browsers doing when they actually follow the link (when I just click on it) so that I can do the same with my scraping application? (Oh, and by the way I'm working on C#)
In the case of eBay it is just a normal link (at least on http://www.ebay.com, look for page 2 of TV's) so the problem is probably with your code (are you storing cookies for instance?). From your description it sounds that it's an AJAX request, which would go "under the hood" and gets XML from the server which is rendered by JavaScript on the client side.
Traditionally, AJAX requests are hard to follow. In the case of ebay, however, I'd suggest use the interface that ebay has to query for information. If you are building a generalized web crawler, then stay away from the AJAX requests. Google doesn't bother either, most of the time.
I did a element.InvokeMember("click"); (where element is an HtmlElement) and it worked. Not sure why though. I'll take a look at that HTTP GET thing anyway.
Related
Is there a way to create a sticky music player, like soundcloud so that when you browse from one page to another on your site your music keeps playing without being interrupted with a postback?
You need single page application for that, you cannot stop the player from interrupting when reloading the whole page.
As I'm looking for the exact same solution, I can tell you: It's not possible without making your whole website using AJAX technology.
So you have to modify your whole page in a way, that the basic HTML construction is always open, and all internal links opens within the AJAX controlled containers.
There is no and probably will be no other way.
One service that does that is bandzoogle.com as you can see in one of their pages here: camilameza.com
When you inspect their code, you can see, that all interaction within the site is within one single container which switches the content.
Good luck!
I would like to display the list of recently accessed/visited web pages just as google chrome shows. I want to achieve this using C# and asp.net.
I am working on a site where the user can see the list of pages that they have visited, I tried using iframe but that does not work as per expectation. I am looking for a clean and intuitive interface something like google chrome.
I would like to provide a thumbnail view of the recently visited pages.
To keep my question simple, I want to display a list of urls as thumbnails, just as chrome does
Note that Chrome can do it easy, as they get to render the page and can take a bitmap snapshot of it easily to show for the thumbnails. (These are not live websites in there)
If you want to do the same thing, you have to render the website offscreen and take a snapshot to show to the user.
If you want to actually show a live website - now that's another story. There are a lot of sites that don't like being shown in the context of another website (for whatever reasons - security, marketing and so on) and will employ any tricks (including legal) to make sure this does not happen.
Pages only visited within your own app?
There are several components that will alow for that using their api. For example:
http://www.tonec.com/products/wssh/index.html
You can just take a snapshot after the DOM for tha page is completely generated and save the output using the tool on peruser basis.
Now, if you want something more generic that works for any web site, you'd probably want to go with a web browser plugin.
Here would be a possible solution, although I've not tried it personally:
Keep a record of all the pages that a user visits (e.g. in a database)
When the user visits a landing page on your site, you could call the WebBrowser.DrawToBitmap function to render a bitmap of each page they have visited recently.
Please note: this is just a theory, I'm not saying it will work! ;)
This link might help you get started:
http://pietschsoft.com/post/2008/07/c-generate-webpage-thumbmail-screenshot-image.aspx
is it possible to take a screenshot of what a textbox holds when the user presses the sumbit button for example?
EDIT: this is an aspx webpage
In short, no it is not possible to do this in a consistent, cross browser fashion (that I am aware of). If your textbox was implemented inside of a flash movie, it would be possible to take a 'screenshot' of what the flash movie was displaying when a button was pressed (discussion on this subject available here). But otherwise, you are going to have to do this processing on the server.
You could simulate this process by having the server render a copy of the page itself (feeding it the data the user entered) and then doing what you wanted with it from there. There are free and paid for solutions to assist you in taking a screenshot of a website (browse options available here).
On the client side I think you're stuck with the limitations of javascript, which might not be possible. Here is another question that is very similar to yours:
Take a screenshot of a webpage with JavaScript?
In the general sense, no, you can't. However if you have a constrained environment (e.g. kiosk, intranet), you can create a browser plug-in which can essentially do anything, including snapping a screenshot and sending it to the server.
If you have lots of control over the environment, you can create your own web browser which can take screenshots. In fact, I've done this with C#. I just wrote an app that hosts a browser control and sends screenshots to the server on certain key presses or at a user-defined interval.
Is there any pattern or kind of "least requirements list" to follow for ensuring an asp.NET application to support BACK button of the browser for each aspx page?
thanks
In general, the back button on the browser will take you to the previous HTML GET or POST that occurred. It navigates by page-wide transactions, so anything done dynamically cannot be navigated that way. Also, the back button doesn't rewind code execution, so if you are determining something based off of a Session variable or something similar, that won't be rewound either. Obviously, it won't rewind database transactions either.
In general, if you want to support the back button, you'll need to make sure to divide everything you need to navigate between with said button is divided by an HTML transaction of some sort.
Again, you're going to run into issues if your page display is dependent on server-side control that changes from one post to the next. This is one reason you see some forms feed a 'Page has expired' error when you try to navigate back to them.
Not really... It depends on your application flow.
There are things that make supporting the back button more awkward.
for example using pure ajax to change the majority of the content on the page,
will look like a 'new' page but wont be compatible with the back button (though you can fudge it)
another example is posting back to the same page more than once, as this can make it appear like the back button is not working, and at the same time re-doing your request (and therefore database transactions)
Fundamentally it depends on your application requirements.
Today i tried to write a program which checks some checkboxes for me on a webpage and then clicks on a button.
For this purpose i tried to use the webbrowser, but how can I set the state of a checkbox there? Searching the internet for hours but no luck only managed to navigate to the webpage with the checkboxes.
One approach would be to write a Bookmarklet, where you create a bookmark that runs JavaScript code, and the other would be to avoid the web browser altogether and instead just send a request directly to the web server that looks like it would if you had checked the checkboxes and clicked the button. Using a tool like wget or curl can make the latter option pretty easy.
Here's a sample URL that you could use to go for the Bookmarklet approach:
javascript:document.getElementById('theCheckBox').setAttribute('checked', 'checked');document.getElementById('theForm').submit();
The easiest way to do the second approach would be to use a tool like Firebug or Fiddler to monitor what a request looks like when you manually submit a page with your checkboxes checked and then construct similar requests through curl.
Using a WebBrowser control is not really a good approach here. The purpose of this control is to display a webpage, not to automate user interaction with it.
The most easy and most reliable solution is to use HttpWebRequest to directly talk to the server, sending a (most likely) POST request.