WebBrowserDocumentCompletedEventHandler Don't wait for Ajax Async to complete

WebBrowserDocumentCompletedEventHandler Don't wait for Ajax Async to complete - c#

After reading Dianyang Wu article and this excellent post I managed to build a small .net app (almost a C&P from Wu source code) to automatize tests (let's call it protoTestApp). My final goal is to open a dozen small windows and in each simulate a different user interacting with a web app to stress it.
It works for some extend but after I logon on the web app (let's call it InternalTestSubject) it calls a external url (let's call it ExternalTestSubject) and injects it's content on a iFrame. This particular external url is another web app and it ill look up for the parent window to get some parameters. Opening ExternalTestSubject directly is not a option.
My problem is at my protoTestApp I want to also interact with that ExternalTestSubject (find a button by id, click it, etc) but at my CompletedEvent handler the iFrame is still empty.
The WebBrowser shows both web apps full loaded and working, so I suppose the handler is just not waiting for the iFrame content to load since it's done by a Ajax async call.
Any advice to acomplish it?

I think I explained this in the answer you linked (and in more details in another related answer). AJAX pages are non-determinustic, so there is no generic approach.
Use periodic asynchronous polling to watch the page's current HTML snapshot or DOM for changes, the linked post illustrates how to do that. You can poll the frame's content in the same way.

I can imagine the frame reports to be ready but it actually does not. For instance, the frame contains frames, you have no way to know whether all these frames are loaded by using DocumentCompleted event.
In short: Using a frame to load external stuff and do the testing is not a good approach. Even if you use a timer to check the loading status manually. But according to security considerations, you will have many problems to access the DOM.
I have two suggestions for you:
Create a WebBrowser instance and open external test subject into it. You will have a very good chance to know, whether document (and its frames) have been loaded completely. You still have the full control to access any elements of the WebBrowser or cookies or click elements or change elements.
Use 3rd tool such as Selenium as test driver.
Update 1:
As the questioner does not want any 3rd tool, I'd suggest let the internal test subject query the loading completeness of the target frame periodically. Possible code can be check document.readyState == 'complete'.
So far as I know, as the external test subject is embedded as frame, due to security consideration, you might not able to access the DOM of the frame. In other words, you cannot do mouse clicks, etc., unless you change the security settings for the Webbrowser control first.

Related

How can I take a remote screenshot of a website from a url?

So I have an application that does:
Take n amount of links from SQL
Creates a new thread for each link
Get HTML code of this website with HTML-Agility-Pack for each thread
Saves its data to SQL (image sizes, page size, word count, words etc) and saves this process with its date
This is to check the data on a website and see if there is any changes (like a typo or a problem with images that were previously uploaded) and I want to add a screenshot/thumbnail to these for each page. How can I take a screenshot of the whole page during each thread?

In order to make a screenshot, you need your HTML to be rendered. This is a task for a web browser. As you are looking for a C# solution, you could use CefSharp (https://github.com/cefsharp/CefSharp) to render your html in offscreen mode.

Seems like an interesting app. As you already get the HTML for the entire URL (I would assume the app is running on a machine/server which has internet connectivity):
There are 3 ways you can do this (many more actually).
In the Thread - create a System.Windows.Forms.Form object, Add a webbrowser control to its list of Child Controls (Dock = Fill). Make the browser navigate to the url. Once navigation is complete - take the screenshot of the WinForms Dialog.
In the thread - launch chrome/IE web browser passing the Url as Command line argument. Wait for sometime in the thread (there isnt a good way to know when rendering finishes). Take the screenshot.
In the thread - Use selenium type of .net compatible library - which helps you do web ui testing automation and then do Step #2. You will have more granular control over the web browser using this approach.

Monitoring changes in web application

I want to monitor changes in background in complex web application. This is one-page application with many scripts and so on. I need to be logged in to have access to data I want to monitor.
I tried to use webrequest, but I think that the application is to complex to do it that way. There is also a problem with authentication.
I also tried WebBrowser component, but web application is telling me, that this browser is too old and I should get newer one.
Perfect solution would:
Open this web application in chrome (or some other modern browser) in background
Save the page to memory
Extract values using something like HtmlAgilityPack
While this will be happening I want to normally use the computer (so opening chrome window is not a good solution for me).
Is there any way to achieve something like that?

if you can cope with an extra browser running, have a look at SeleniumHQ. with its webdriver-backed selenium you can start a dedicated browser instance and perform user actions by coding in high-level programming languages like java. it should not interfere your manual work at all, but will take up the same amount of memory and cpu time your "real" browser would.
if the web application has no captcha and does not object to automated script accessing it, you could also login in a background program by sending appropriate HTTP requests and parse the response. python's urllib2 would be my first choice.
if you dont want any additional processes running, you could also create a browser plugin, that autorefreshs and parses a certain open tab every few seconds.

How to programmatically click a button on a webpage in bot (web crawler)?

I would like to build a bot - web crawler - to collect phone numbers.
I have a problem though: to see the phone number, a user must click something like "Show".
How can I solve this problem?

Check what the act of clicking on the button does. Does it call a Javascript function? Does that make an HTTP call to a backend? If so your bot should do that call instead of screen-scraping the first page. If not, does it just play with the DOM of the page to show an item on screen?

All the data you're looking for comes from some sort of back-end, so if you look in the developer tools of your browser when going through the page you can usually figure out what calls to script in order to get the data.
It is possible to make this harder (and that is what some sites to to protect themselves from scraping). Typically if you're in this situation, what you're doing is not entirely legal or nice. But technically it's very interesting, so here goes.
The best way to go forward is to run the site in a real browser (like PhantomJS, or Chrome) and use a framework like Webdriver to simulate browser interactions. This way you can pull most of the data out usually.
If you find that your ip gets blocked, you may use Tor and use multiple instances dynamically to hit the site... but make sure you ask the site owner nicely if you're allowed to do that of course.

C# Webbrowser Automation

Background:
I am creating a Windows Form App that automates order entry on a intranet Web Application. We have a large amount of order entry that needs to be done that will cost us a lot of money so I volenteered to automate the task.
Problem:
I am using the webbrowser class to navigate the web app. I have gotten very far but reached a road block. There is a part in the app that opens a web dialog page. How do I interact with the web dialog. My instance of the webbrowser class is still with the parent page. I am hoping someone can point me in the right direction.

You've got a number of options. To expand on the answers from others and add a new idea...
Do it using the webbrowser control: This is technically possible by either injecting javascript into the target page as demonstrated here or creating a JavaScript object and using it as a bridge via the webbrowser.objectforscripting property. This is very fragile - something as simple as the website changing an element's Id could break it. You also need to make sure your code doesn't interfere with the functioning of the form (clashing function names, etc...)
Do it using a postback: Monitor the communications between the web browser and the server (I personally prefer Firfox/Firebug but ie/Fiddler or Chrome/F12 are both good too). As long as you can replicate the actions of the browser exactly, the server can't know the difference. The problem here is that browsers are complex and the more secure a form is, the more demanding servers are. This means you may have to fake a login, get cookies, send them back on subsequnt requests, Handle Viewstate data and xss prevention variables. It's possible and it's far more robust than the first option but can be a pain to get working. If it's not a highly secure form,, this is your best bet. More information here
Do it by browser automation: Selenium is probably the best option here (as mentioned by others) but suffers from a similar flaw to the webbrowser control in that it's sensitive to changes on the form itself (But not as mcuh so as the webbrowser control).
Incidentally, if you have Visual Studio Ultimate/Test edition (and some others, not sure which), it includes a suite of testing tools including an excellent engine to automate load testing a website. This is also superb for tracking down what exactly a form does as you can see every step of the emulation.
Hope this helps

You have two choices depending of the level of complexity you need:
Use a HTTP Debugger like Fiddler to find out the POST data you
need to send to each page and mimic it via a HttpWebRequest.
Use a Browser Automation Tool like Selenium and do the job.
NOTE: Your action may be considered as spamming by the website so be ready for IP blocking, CAPTCHA...

You could give Selenium a go: http://seleniumhq.org/
UI automation is a far more intuitive approach to these types of tasks.

I want to programmatically launch a browser with a URL, wait for it to finish, then close the browser

I have created an Html 5 page that provides important server-side functionality. Unfortunately, it must be run in an Html 5 browser (Chrome, IE9, or Firefox) with a canvas to produce the results I need. It is completely self contained, taking needed parameters through the URL, and is ready to be closed when the OnLoad event is ready to send. So far so good.
The following process needs to be automated (no human eyes or interaction) and will be run from within a web service (not run from within a browser). Ideally, I don't want to waste extra cycles with busy wait, or delay the result by waiting for long time periods simply hoping the process has finished. I need to:
Open a browser (preferably Chrome) with a URL, using C#.
Wait for the page to completely finish loading - ideally receiving a callback of some kind.
Close the browser page when finished, again with C#.
We've tried using IE9. There is C# support to launch IE9, Wait until not Busy, and gracefully Close the browser; however, the page loads resources asynchronously (there is no way around this), and so we get the signal that it is no longer busy during the resource load - instead of when the page has finished. Adding busy wait would consume valuable server-side cpu cycles.
A simple Create Process call would be nice, but would only work if the browser could close itself with some html - but thanks to security measures in the browsers, I can't find a reliable way to use html commands to close a browser that was launched from command-line (I did see you can close tabs spawned from an already opened page - firefox only, but this doesn't help).
Does anyone know how I can accomplish this goal? Again - there is no human involvement in any of the process, no human eyes will ever see the page or interact with it in any way. The page only runs on the server machine, and will never be deployed to a client machine.

I would suggest to use the WebBrowser control to load the HTML. Once you get the data back, use an ObjectForScripting to call a c# method to notify when done.
See http://www.codeproject.com/Tips/130267/Call-a-C-Method-From-JavaScript-Hosted-in-a-WebBro
You dont really have to even show the webbrowser control.
Let me know if you have any questions. Hope it helps!

Automating the browser - thats what Selenium does. I think it will be a good fit for the task, and there's good C# support. It can even run the browser on a remote machine using the Selenium RC server.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.