Prevent Selenium from opening new window - c#

today, I use Selenium to parse data from a website. Here is my code:
public ActionResult ParseData()
{
IWebDriver driver = new FirefoxDriver();
driver.Navigate().GoToUrl(myURL);
IList<IWebElement> nameList = driver.FindElements(By.XPath(myXPath));
return View(nameList);
}
The problem is, whenever it runs, it opens new window at myURL location, then get the data, and leave that window opening.
I don't want Selenium to open any new window here. Just run at the background and give me the parsed data. How can I achieve that? Please help me. Thanks a lot.

Generally I agree with andrei: why use Selenium if you are not planning to interact with browser window?
Having said that, simplest thing to do to prevent Selenium from leaving the window open, is to close it before returning from the function:
driver.Quit();
Another option, if the page doesn't have to be loaded in Firefox, is to use HtmlUnit Driver instead (it has no UI)

Well, it seems that on each web request you are creating (though, not closing / disposing) a Selenium driver object. As I have said in the comment, there may be better solutions for your problem...
As you want to fetch a web page and extract some data from it, feel free to use:
WebClient
WebRequest
A web application is not very a hospitable environment for a Selenium driver instance IMHO. Though, if you still want to play a bit with it, make the Selenium instance static and reuse it among requests. Still, if it will be used from concurrent requests (multiple threads running at the same time), a crush is very probable :) You have the option to protect the instance (locks, critical section etc.) but then you will have zero scalability.
Short answer: fetch the data by in another way, Selenium is just for automatic exploration tests as far as I know...
But...
If you really have to explore that website - the source of your data - with Selenium... Then fetch the data using Selenium in advance - speculatively, in another process (a console application that runs in background) and store it in some files or in a database. Then, from the web application, read the data and return it to your clients :)
If you do not have yet the data the client has asked for, respond with some error - "please try again in 5 minutes", and tell the console application (that's running in background) to fetch that data (there are various ways of communicating across process boundaries - the web app and the console app in our case, but you can use a simple file / db for queuing "data requests" - whatever)...

Related

Using C# can I communicate with webpages and local desktop programs?

I am very new to C# and Microsoft Visual Studio, so, with that in mind, I am teaching myself and have started a project (excuse my lack of vocabulary). What I have so far is a WPF project that looks good but offers zero functionality. The general functionality I envision is this:
The Main Window has multiple buttons which navigate to multiple pages. (Achieved this already)
On each page navigated to, I want to display information from a website. (?)
Using the web information, I want control another program on the desktop. (?)
Are points (2) and (3) possible using C#?
Let me illustrate the scenario. A person submits information (username) into a website. That website contacts a server and sends back data about that person/username. The website then stores this data and usernames on a list visible to the users. There are five different lists and five navigable xaml pages via the main window on the program. I want to display each list on each page. Using the data found on the website and now my program, I want to send a command to a program/script running on the desktop and have it perform an action (type the usernames somewhere using AutoHotKey and AutoScriptWriter, which is essentially updating a special notepad file).
The answers I am looking for are not "this is how you do specifically what you're asking" but rather "Use these tools/features in C# and start there". If what I want from this program is possible, I have these follow up questions:
The information submitted to that website would be constant, so would the web information viewed through the program be updated/refreshed in real-time?
Would creating an entirely new website to work with the program be more beneficial than using an existing website and scraping information from it?
Can a program communicate with another program on a local virtual desktop via Oracle VirtualBox?
If someone used this program on their computer, could they command the program/script on my computer via the internet?
Quick answers to your various questions:
Is not a question...
Websites are just user interfaces sitting on top of logic defined on a webserver - if you can interact with the logic (i.e. a webservice) instead of with the raw UI layer (which is HTML) you should rather do that. You can find a control which will render HTML information but this is unlikely to be the best approach for what you want to do.
Yes, through COMInterop with the Windows Shell - quite an advanced topic and I hope you understand Windows SDK, memory management and unmanaged, unsafe pointer-based code quite well.
Follow-up questions:
I don't understand the question - you're submitting information constantly or the information you're submitting is always the same? You would need to trigger the refresh of the web information (i.e. request the particular page) when you want a refreshed rendering of the webpage.
Most beneficial would not be using the user interface at all (webpage) and establishing a link to the logic layer instead and requesting the data directly via a web service - it gives you the most control in your WPF application.
Yes, but again, very advanced stuff - it's not simple or easy and comes with a host of challenges such as local security and automation APIs for VirtualBox.
Doubtful unless you wrote the code to be internet-enabled which again, is an advanced topic.
If I understood your situation correctly, you can use the WebClient class to communicate with your web server and use the returned string by it to generate the content.
WebClient web = new WebClient();
web.Headers.Add("HTTP Header", "Header Value");
web.Headers.Add("POST Data Header", "POST Data Value");
string response = web.DownloadString(new Uri("https://www.myserver.net/mypage"));
// Implement your processing on response variable here to generate and present data to the user.
And you can use the StandardOutput of the Process class to get the output of any programs on the local machine.
Process proc = new Process();
proc.StartInfo.FileName = #"C:\Path\to\my\executable.exe";
proc.Start();
proc.StartInfo.RedirectStandardOutput = true;
// Start the process...
proc.Start();
// Retrieve the output...
string output = proc.StandardOutput.ReadToEnd();

Capture failed loads of contents using Selenium

When loading a web page, it executes many GET request to fetch resources, such as images, css files, fonts and other stuff.
Is there a possibility to capture failed GET requests using Selenium in C#?
Selenium does not natively provide this capability. I'm coming to this conclusion for two reasons:
I've not seen any function exported by Selenium's API which would allow doing what you want in a cross-platform way.
(I'm saying "cross-platform way" because I'm excluding from considerations possible non-standard APIs that could be exported by one browser but not others.)
If there is any doubt I may have missed something, then consider that ...
The Selenium team has quite consciously decided not to provide any means to get the response code of the HTTP request that downloads the page in the first place. It is extremely doubtful that they would have slipped behind the scenes a way to get the response code of the other HTTP requests that are launched to load other resources.
The way to check on any such requests is to have the Selenium browser launched by Selenium connect through a proxy that records such responses. Or to load the page with something else than Selenium.

how to write to text file constantly while site is live

What I'm trying to do is create an asp.net page that runs a random number generator, displays the random number, and writes it to a text file. That part is no worries, the issues is I want the number generation and file writing to continue while the page is live - ie if no one is actually viewing the page, it's just sitting on the server, the process should continue.
Is this possible?
EDIT: Foolishly overlooked using a webservice to generate the number - I've knocked up a basic service that generates a number and writes it to a text file. Can't work out how to schedule/automate it - could I set up a timer, with a given interval, then use timer_Tick?
Scheduling is new to me, any advice is appreciated.
You can use Window Service to work in backgroud, please see below link:
http://www.codeproject.com/KB/dotnet/simplewindowsservice.aspx
http://www.codeguru.com/columns/dotnet/article.php/c6919
Have you considered the use of scheduled tasks? So, rather than the page calling the updates, the scheduled task does that, and the page viewer is just seeing "latest results" at any specific point. Of course, that may not be feasible, but by the sounds of it, you're after a constantly working service/task with an ability to view the latest number, a little like an RSA token which shows new numebrs even if you dont need one.
Not sure if this is what you want. But if you are interested in using a scheduler for this task, you can try Quartz.Net. It is a very popular, full-featured and open source sheduling system.
Please describe what you are trying to achieve. There might be a better way than writing random numbers to a file.
I would not use a service (web or winservice) for this. There is no benefit to use a webservice since it will just do exactly the same as your web would do. A windows service will continue to run independent of your web, and you need to create some kind of IPC and to keep track of several timers/files.
The easiest way to do this is to use a System.Threading.Timer and keep it in a session variable. Also note that you need to kill it when the user session expires.
You should also be aware of that one timer will be created per user that uses the page.
Update
Create a Windows Service application and add a System.Threading.Timer to it. Write to the file in the timer callback.
Then open the textfile in your web app (using FileShare.ReadWrite + FileMode.Read)

How can I measure the response and loading time of a webpage?

I need to build a windows forms application to measure the time it takes to fully load a web page, what's the best approach to do that?
The purpose of this small app is to monitor some pages in a website, in a predetermined interval, in order to be able to know beforehand if something is going wrong with the webserver or the database server.
Additional info:
I can't use a commercial app, I need to develop this in order to be able to save the results to a database and create a series of reports based on this info.
The webrequest solution seems to be the approach I'm goint to be using, however, it would be nice to be able to measure the time it takes to fully load the the page (images, css, javascript, etc). Any idea how that could be done?
If you just want to record how long it takes to get the basic page source, you can wrap a HttpWebRequest around a stopwatch. E.g.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(address);
System.Diagnostics.Stopwatch timer = new Stopwatch();
timer.Start();
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
timer.Stop();
TimeSpan timeTaken = timer.Elapsed;
However, this will not take into account time to download extra content, such as images.
[edit] As an alternative to this, you may be able to use the WebBrowser control and measure the time between performing a .Navigate() and the DocumentCompleted event from firing. I think this will also include the download and rendering time of extra content. However, I haven't used the WebBrowser control a huge amount and only don't know if you have to clear out a cache if you are repeatedly requesting the same page.
Depending on how the frequency you need to do it, maybe you can try using Selenium (a automated testing tool for web applications), since it users internally a web browser, you will have a pretty close measure. I think it would not be too difficult to use the Selenium API from a .Net application (since you can even use Selenium in unit tests).
Measuring this kind of thing is tricky because web browsers have some particularities when then download all the web pages elements (JS, CSS, images, iframes, etc) - this kind of particularities are explained in this excelent book (http://www.amazon.com/High-Performance-Web-Sites-Essential/dp/0596529309/).
A homemade solution probably would be too much complex to code or would fail to attend some of those particularities (measuring the time spent in downloading the html is not good enough).
One thing you need to take account of is the cache. Make sure you are measuring the time to download from the server and not from the cache. You will need to insure that you have turned off client side caching.
Also be mindful of server side caching. Suppose you download the pace at 9:00AM and it takes 15 seconds, then you download it at 9:05 and it takes 3 seconds, and finally at 10:00 it takes 15 seconds again.
What might be happening is that at 9 the server had to fully render the page since there was nothing in the cache. At 9:05 the page was in the cache, so it did not need to render it again. Finally by 10 the cache had been cleared so the page needed to be rendered by the server again.
I highly recommend that you checkout the YSlow addin for FireFox which will give you a detailed analysis of the times taken to download each of the items on the page.
Something like this would probably work fine:
System.Diagnostics.Stopwatch sw = new Stopwatch()
System.Net.HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create("http://www.example.com");
// other request details, credentials, proxy settings, etc...
sw.Start();
System.Net.HttpWebResponse res = (HttpWebResponse)req.GetResponse();
sw.Stop();
TimeSpan timeToLoad = sw.Elapsed;
I wrote once a experimental program which downloads a HTML page and the objects it references (images, iframes, etc).
More complicated than it seems because there is HTTP negotiation so, some Web clients will get the SVG version of an image and some the PNG one, widely different in size. Same thing for <object>.
I'm often confronted with a quite similar problem. However I take a slight different approach: First of all, why should I care about static content at all? I mean of course it's important for the user, if it takes 2 minutes or 2 seconds for an image, but that's not my problem AFTER I fully developed the page. These things are problems while developing and after deployment it's not the static content, but it's the dynamic stuff that normally slows things down (like you said in your last paragraph). The next thing is, why do you trust that so many things stay constant? If someone on your network fires up a p2p program, the routing goes wrong or your ISP has some issues your server-stats will certainly go down. And what does your benchmark say for a user living across the globe or just using a different ISP? All I'm saying is, that you are benchmarking YOUR point of view, but that doesn't say much about the servers performance, does it?
Why not let the site/server itself determine how long it took to load? Here is a small example written in PHP:
function microtime_float()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
function benchmark($finish)
{
if($finish == FALSE){ /* benchmark start*/
$GLOBALS["time_start"] = microtime_float();
}else{ /* benchmark end */
$time = microtime_float() - $GLOBALS["time_start"];
echo '<div id="performance"><p>'.$time.'</p></div>';
}
}
It adds at the end of the page the time it took to build (hidden with css). Every couple of minutes I grep this with a regular expression and parse it. If this time goes up I know that there something wrong (including the static content!) and via a RSS-Feed I get informed and I can act.
With firebug we know the "normal" performance of a site loading all content (development phase). With the benchmark we get the current server situation (even on our cell phone). OK. What next? We have to make certain that all/most visitors are getting a good connection. I find this part really difficult and are open to suggestions. However I try to take the log files and ping a couple of IPs to see how long it takes to reach this network. Additionally before I decide for a specific ISP I try to read about the connectivity and user opinions...
You can use a software like these :
http://www.cyscape.com/products/bhawk/page-load-time.aspx
http://www.trafficflowseo.com/2008/10/website-load-timer-great-to-monitor.html
Google will be helpful to find the one best suited for your needs.
http://www.httpwatch.com/
Firebug NET tab
If you're using firefox install the firebug extension found at http://getfirebug.com. From there, choose the net tab, and it'll let you know the load/response time for everything on the page.
tl;dr
Use a headless browser to measure the loading times. One example of doing so is Website Loading Time.
Long version
I ran into the same challenges you're running into, so I created a side-project to measure actual loading times. It uses Node and Nightmare to manipulate a headless ("invisible") web browser. Once all of the resources are loaded, it reports the number of milliseconds it took to fully load the page.
One nice feature that would be useful for you is that it loads the webpage repeatedly and can feed the results to a third-party service. I feed the values into NIXStats for reporting; you should be able to adapt the code to feed the values into your own database.
Here's a screenshot of the resulting values for our backend once fed into NIXStats:
Example usage:
website-loading-time rinogo$ node website-loading-time.js https://google.com
1657
967
1179
1005
1084
1076
...
Also, if the main bulk of your code must be in C#, you can still take advantage of this script/library. Since it is a command-line tool, you can call it from your C# code and process the result.
https://github.com/rinogo/website-loading-time
Disclosure: I am the author of this project.

Downloading webpage image from SSL + WatiN

I've been thinking about this and can't seem to find a way to do this:
I've got some code running on WatiN 2.0 which connects to a site via an SSL tunnel, and after performing certain tasks (which there're no other feasible ways to automate without relying on a browser) should be able to download an image from the very same SSL connection. The image is served dynamically depending on some state generated during navigation, and is not served but through the SSL connection associated with the aforementioned state, so I really need to stick with WatiN + IE.
Thanks in advance
If I understand you correctly, you are trying to go to a web page (via multiple steps) and then save a copy of an image (dynamically generated) on that page right?
If so, I don't think there's a way to do this built in to WatiN, but I stumbled across a thread on the WatiN mail list archive which may help.
Basically it looks like you can use WatiN to dynamically generate some javascript to run against your page and copy the image to the clipboard & then grab the image from the clipboard in your test code.
Hope that is of some help to you...

Categories