Webbrowser Trying to parse some page which uses ajax to load data - c#

I am trying to parse some images from this site..
I was using htmlagilitypack for the other pages
but this page uses ajax to load images
so this is how the webpage works.
has a div tag including nothing.
right below the div they have a script tag in cdata thing
<script type="text/javscript> //<
![CDATA[ (function(){
ajax POST request to a 'aaaaaa.js' file with the id parameter and if the request is success, it updates the blank div my changing the innerHTML value.
})(); //]]>
</script>
So.. what I tried was...
navigate to that page using the webbrowser control.
which loads the image just fine but when I try to get the value of 'DocumentText'
it only shows the blank div tag...
try to get straight data from the ajax POST using webrequest and webresponse..
but.. maybe cause its .js file.. it doesnt work.. I only get http errors.
Browse right to the js file with the parameters attached.
gives me an error page
Browse the page Im trying to parse and then navigate to the .js page.
(I guess the browser caches something when i browse the original page.. but i dont know what it is.)
I do get the json response! i can use this data.. but since the webbrowser control is using IE. It just asks me if i want to download the responsed js file.

So theres the method of using the DOM I mentioned in my comment.
Another method could be to use FiddlerCore ( http://www.fiddler2.com/fiddler/Core/)
Or make the ajax call yourself. You will have to make sure you respect cookies, redirects, and all the headers.

Related

Getting content from web page, that is populated through a javascript

I'm trying to parse a web page using Html Agility Pack, what I have understod from my attempts is that the web page is "populated" using a javascript. When I load the page using
HtmlDocument doc = web.Load(linkToPage);
I get an empty page. The page is a sub page so to say, and I'm using the original page to scrap the links to these sub pages (it works for the main page since this one does not used javascript to populate the page, I assume).
Is there a way to parse a web page that populates through javascript, or is there a better tool for this?
See this if you wish to use JAVA, I worked with FTL and also JSrender, both were pretty cool

How to trigger ajax calls from a HTMLDocument Object in C#

I am trying to retrieve some data from a web page by pulling the HTML into a string and parsing through it. The problem is that the information I want only shows up on the page when a user browses to the bottom of the page and triggers an Ajax call to update the DOM. Is there any way to do this in code without loading the HTML into a browser control and telling it to scroll?

How to get this file prompted to the user?

Hi Im working on a MVC3 Razor project and I have been stuck on this problem for a few hours now...
I'm trying to get a html to pdf converter to serve a document to the user...
What I want is the following..
A page is rendered and displayed. On the bottom of the page there should be a little icon display something like download as pdf. and what that does is where my problem lies..
All the data that I want is dynamically created within a $("#content").html();
So what I have tried is a jquery/ajax function passing $("#content").html(); as a paramater to my function which creates the pdf (works but I have no clue how to prompt the created file to the user?)
Other solution was #(Html.ActionLink()) but I dont know how to pass the data ($("#content").html()) within that link?
And when trying to work with the functionallity to use the converter to go to the url was a dead end cuz it got its own session and got redirected to the loginpage..
Any help would be appriciated!
I am not familiar with JavaScript, but what if you save the PDF in the temp file first by calling you web service method, and then after it completed you can use JavaScript to navigate to the URL where generated PDF will be returned as content.
Since the XmlHTTPRequest Object cannot handle datatypes other than html, text, json, jsonp and xml you will need to redirect to the pdf location.
I'm not sure what exactly you're doing in your Ajax Request, but once that is completed you can just redirect the window(Form-Action) to the location of the created PDF. This wouldn't actually redirect the browser but only prompt for saving the file.
Why just create a function that returns the created PDF (during ajax request) as File result and then set the window's location to point to this action once the ajax request is completed successfully?
Edit: That means you are not saving the PDF anywhere. So the workaround, is to either use this jquery download plugin or append an iframe dynamically and then post the data through it. Hope this helps.
Figured out a workaround, what I did was write an extension to my HtmlHelper and use that to render my control into a html string instead of a view. Therefor I could use a actionlink to say render this page and get all the html data that way.
http://msug.vn.ua/Posts/Details/3301
Thanks for help tho!

How to access a DIV on a page being accessed by an iFrame?

I have a print function using JavaScript that prints just the contents of the iFrame and not the page the iFrame is in. What I want to do is have a DIV or something in the page being access through the iFrame that will go visible when you print and have a legend on it and then it will go back to invisible once the page is printed? Any help on how to do this? I am not the most adept at using JavaScript but will try it.
Thanks,
Forget the JavaScript. Trying to fiddle the page around a print from script is complicated, fragile and pointless. This is what a print stylesheet is for!
Add a stylesheet with media="print" which contains display: rules to hide all the parts of the part you don't want printed, and cause normally-hidden parts of the page to appear when being viewed on a printer. You don't even need a scripted print button, the normal web browser Print function will pick up the differences.
The IFrame Javascript element has a document property, which works identically to the normal page's document property.
That being said, if your IFrame's framed page is on a different domain from your framing page, you are probably out of luck. Browsers limit what one site can do to another site to prevent cross-site scripting attacks, and these limits will prevent you from accessing the framed page's document (unless it's on the same domain or you setup a cross-site file on the framed page's server).

Opening an external page inside our page

I used to implement this above title by using iframe but now I dont want to use it any more I have some plans in my mind I need to implement them by opening an external page inside our asp.net page without using any iframe I have only simple aspx page with div tage and panel and some other serverside componants, I just want to know how I can do it without iframe ? I don't want to design new complex control but I am looking for some methods can do that for me.
I have to mention that I need to control area which is loaded by external site as the same as iframe but the difference is that iframe can not handled by ajax even you put iframe inside the update panel your page has refresh and postback while you are changing the src value programmatically (in c# code) so we have to design some others methods what is the solution ?
I thought I can make request an get some html and show into div but I couldn't to implement it.
You could
Make a WebRequest on the server-side and then set the div's text to HTML returned
You could make an invisible iFrame to make the request and then use JavaScript to grab the HTML from the iFrame and put it in a DIV. (EDIT: Comment suggests this won't work)
You can't generally make calls (like XmlHttpRequest) to external websites because of cross-site scripting issues.
Your direct request, "opening an external page inside our asp.net page without using any iframe" is not possible, by design.
You mention AJAX. You can use AJAX to load your page, remove the headers (or do that serverside) and replace the <body> tag with a <div> tag (or do that server side too). This way, you can place the contents of your page anywhere you like. As a container, I suggest you use a block level element, a <div> would suffice.
The only (!) problem here is: cross-site requests like this are not honored by browsers. You can solve this server-side by loading the page from elsewhere using WebRequest or similar means.
Depends on where you'd like to merge the data. If you'd like to merge the data on the client browser, your only other option besides frames is to use Javascript/Ajax.
You can do a jQuery.ajax() on page load and use the html() method on a div to populate it with the textual result of that AJAX call.
Try to use as little of the WebForms control hierarchy and life-cycle as possible. It sounds like your problem can be fixed with AJAX if you don't mind the second request on page load.
If you would like to merge the content on the server side ( rarely the right thing to do ) you can use System.Net.HttpWebRequest to get and merge the data before returning it to the browser.
there's no substitute for an iframe in your situation. you're not going to be able to make ajax requests to the other site due to security concerns. you could retrieve the contents of a single page server side and render it to the client but none of the functionality will be included, since the content is now running in the context of your own site.

Categories