Hello guys I have an issue bugging me for the past few weeks.
What I'm trying to accomplish: I need a webbrowser control with the ability to change user agent (once at start) and referrer. But most important The ability to see the urls responses. What I mean by that for example if you navigate to a website you get back Images/Javascripts files/Dyanmic URLS in response I need access to those urls which some of them have dynamic variables (Regular Webbrowser Control will not show you those & you can't access it in any way beside using fiddler core).
I was able to do that with webbrowser + fiddlercore I can see and do what ever with those urls addresses. The problem was if you run few instances of this program (or sometimes once if the program has some automation to work with the url responses) It gets stuck or doesn't work. I tried fixing it and making it work but it's kind of a hacky solution that doesn't work right. I need a simple way to access those urls just as if you used httpwebrequest but as a webbrowser. Why I need it as a webbrowser? The way I work I need the execution of all the tracking pixels and scripts and images etc.. a normal webbrowser behaivor in httpwebrequest you can't just navigate and all the scripts will be execute as webbrowser, or can you?
Using the System.Windows.Forms.WebBrowser control in a WinForms app, set the webBrowser.URL property to the URL of the page you're interested in.
The webbrowser's DocumentCompleted event fires after the page has loaded. Any dynamically loaded JavaScript should be done by then. Hook the DocumentCompleted event and use the webbrowser.Document.Images to get a list of all image elements on the page. From those images you can get their SRC attributes which contains their URLs including any query parameters hanging off the end. You can use webbrowser.Document.Links to get a list of all hyperlinks on the page. For other HTML elements of interest, you can use GetElementsByTagName("foo") to fetch all elements with that tag name from the page, then dig into their attributes to pull out URL properties.
With webbrowser.Document you can get to any HTML element, whether it is statically or dynamically created.
What you can't get to through webbrower.Document is data that is loaded asynchronously using XMLHttpRequest(), because this data is not part of the browser Document Object Model. Web pages with scripted false buttons will be difficult to intercept.
However, if you know where the data is stored by the JavaScript executing on the page, you may be able to access it using webbrowser.Document.InvokeScript(). If the JavaScript on the page stores URLs in a mydata property of the window object, for example, you could try webbrowser.Document.InvokeScript("window.mydata") or some variation to retrieve the value of mydata into the C# app.
Related
Is it possible to embed only a specific portion inside of a page(i.e. div) on the WebView control in UWP?
The webview doesn't include an address bar and so will only show pages that you load or can be navigated to from those pages. If you can control what is loaded you can show only the pages you wish.
If you don't have control over the HTML of the site, you could prevent access to pages you don't want by handling the NavigationStarting event and canceling the navigation if it's to a page you don't want displaying.
No, assuming by a "portion" of a website you mean a portion within a page. However as Matt said, if you only want to show a certain page, you can of course navigate to the relevant web page and then control what navigation links you allow through his method.
If you did indeed mean only a portion within a page, the options aren't attractive. Theoretically you could pull down the html of a site with an Http request and then parse it to filter out what you didn't want, store it locally, and then load that local page in the webview. However I think it would add significant overhead, could create conflicts with javascript or whatever else the site might be using, among other things. But hey maybe if it's a very targeted use-case you might be able to play with it.
If in your scenario though you also control said website the app is accessing, then I'd just enable specific URLs to offer up the desired versions of your site when the app requests 'em.
Note 100% sure, but i can give a hit to embed external part of a any html content to a website opened in webview control.
var result = await this.webView.InvokeScriptAsync("eval", new[] { "document.documentElement.outerHTML;" });
string result = await this.webView.InvokeScriptAsync("eval", new [] {"document.getElementById('tablePrint').innerHTML = myTable;" });
Add HTML content through java script.
Hope that will help
I have a webpage that I want to monitor that has stock market information that I want to read and store. The information gathered is to be stored somewhere, say a .csv file or similar for later analysis.
The first problem I have is detecting when this page has fully loaded. The time taken to load can vary enormously. The event handlers I have tried all fire multiple times (I know this has been covered and I have tried the various techniques, but to no avail). Perhaps it is something specific to do with this web-page? Anyway, I need to know when this page has fully loaded and is sitting pretty with all graphics displayed properly.
The second problem is that, I cannot get the true source page into the webbrowser. As as a consequence, all access to the DOM fails as the HTML representation inside the webbrowser control appears not match what is actually happening on the webpage. I have dumped the text (webBrowser2.DocumentText) and it looks nothing like what you see when I check source in a browser, chrome for example. (I also use the firebug extension in Firefox to double check things). How can I get to the correct page into the webbrowser so I can start to manipulate things?
Essentially, in terms of the data, I need the GMT Time, Strike Rate and expiration time. My process will monitor with a timer control. To be able to read all the other element data on screen is a nice-to-have.
Can this be done?
I am an experienced programmer new to web programming and C#.
I think you want this AJAX request.
As a review, the web works by first loading the web page, then scanning the web page for additional files it needs to load (js, css, images, etc). When those finish, the onload event is triggered and some AJAX functions may run.
In this case, only some of the page is loaded and AJAX functions update the data in the graph later. As you've seen "Show Source" only shows the original file that was downloaded and is not a dump of its current state.
The easiest way to get the data is to find the URL of the AJAX request that loads the graph data. It is already conveniently formatted in JSON for you to scrap.
I have a crawler application which should parse all items from a Page with paging. Unfortunately the web site that my application crawls uses postbacks for paging. How can I get contents of second page programmatically for following URL in C#.
http://www.hurriyetemlak.com/coldwell-banker-dikey-gayrimenkul/konut-satilik/istanbul-basaksehir-bahcesehir/emlak-ofisleri-ilanlari/3OWB4lkhYFs=/9wZEBZ-ivFgmrA3ENMCIfQ==/qh.BgsUoTK4=/GmMGgVD5Wcc=/GmMGgVD5Wcc=?sParam=3OWB4lkhYFs%3d&sType=9wZEBZ-ivFgmrA3ENMCIfQ%3d%3d&ListIsBig=qh.BgsUoTK4%3d&sortType=GmMGgVD5Wcc%3d&pageSize=GmMGgVD5Wcc%3d
I've tried posting __EVENTTARGET hidden field along __VIEWSTATE and __EVENTVALIDATION but it didn't seem to work.
You can achieve this using screen scraping techniques (See HtmlAgilityPack). This will require you to read the response and reissue form posts to mimic what the user would do in a browser. Simple request replays will not work.
You may also need to pass __EVENTARGUMENT hidden field. And don't forget to set name attribute, as well as id attribute.
Is there a way in C# to get the output of AJAX or Java? What I'm trying to do is grab the specifics of items on a webpage, however the webpage does not load it into the original source. Does anybody have a good tutorial or a good place to start?
For example, I would want to get all the car listings from http://www.madisonhonda.com/Preowned-Inventory.aspx#layout=layout1
If the DOM is being modified by javascript through ajax calls, and this modified data is what you are trying to capture then using a standard .NET WebClient won't work. You need to use a WebBrowser control so that it will actually execute the script, otherwise you will just be downloading the source.
If you need to just "load" it, then you'll need to understand how the page functions and try making the AJAX call yourself. Firebug and other similar tools allow you to see what requests are made by the browser.
There is no reason you cannot make the same web request from C# that the original page is making from Javascript. Depending on the architecture of the website, this could range in difficulty from constructing the proper URL with query string arguments (easy) to simulating a post with lots of page state (hard). The response content would most likely then be XML or JSON content instead of the HTML DOM, which if you're scraping for data will be a plus.
A long time ago I wrote a VB app to screen scrape financial sites and made it so that you could fire up multiple of these "harvester" screen scrapers. That might ease the time period loading data. We could do thousands of scrapes a day with multiple of these running on multiple boxes. Each harvester got its marching orders from information stored in the database, like what customer to get next and what was needed to scrape (balances, transaction history, etc.).
Like Michael said above, make a simple WinForms app with a WebBrowser control in it. You have to trap the DocumentComplete event. That should only fire when the web page is completely loaded. Then check out this post which gives an overview of how to do it.
Use the Html Agility Pack. It allows download of .html and scraping via XPath.
See How to use HTML Agility pack
my client has a website that currently makes requests on a particular event (click, load etc). the website is working well and all but he has a problem with getting the website statistics with Google Analytics, the reason is, because the website never does a re-direct to a different page, it is all within the same page and no matter what event is loaded in the website(be it a video, tables etc) everything are displayed under the same url
www.somewebsite.com/default.aspx
what I want to achieve is on a particular event, change the url to
www.somewebsite.com/default.aspx?type=abc&id=999
How can I do this,. what is the easiest method to do this? please advise. The whole website is built in C#.
Many Thanks.
Is this event happening on the server or the client?
If it's on the server, you can call Response.Redirect and add your new query string parameter to the current url.
If it's on the client (Javascript), you can set the location property.
If you want to preserve your page's state, try adding your querystring parameter to the form's action parameter in Javascript.
Alternatively, as jeffamaphone suggested, you can change the hash (the part after the # sign) by setting location.hash without posting back to the server.
Actually, you should probably move some of the elements to different pages... this is based on what you said:
because basically all I am doing is
hiding and showing elements based on
events, if i do a response.redirect,
it will reload teh homepage.
If I understand correctly, the user is never redirected to a different page, you are just hiding/unhiding elements from default.aspx based in query strings... correct? the simplest solution will be to split that content into different aspx pages.