Selenium - PhantomJS - Findelements in DOM traversal is slow - c#

For, reasons, I'm trying to recurse through the DOM using Selenium/PhantomJS.
It works but its slow and I dont know why.
Findelements seems to take about 250ms every time.
I've tried zeroing the implicit wait with not much success. I've also tried using the Xpath with no real change.
Here's the code, any suggestions ?
public static void RecurseDomFromTop()
{
DomRecursor( pjsDriver.FindElement( By.TagName( "*" ) ) );
}
public static void DomRecursor( IWebElement node )
{
ReadOnlyCollection<IWebElement> iwes = node.FindElements( By.TagName( "*" ) );
foreach (IWebElement iwe in iwes)
{
DomRecursor( iwe );
}
}

The approach you are taking to compare two doms this way is wrong. Every time you make Selenium request there is a HTTP request created that is sent to the driver, which send it to the browser, which then sends it back to driver and driver back to you language binding. There is a lot of overhead involved in this.
Instead what you should do is use driver.PageSource and get the whole HTML response in a single call. Then later you can use HTML parsing libraries which are at least 10x faster than the approach you are taking now.
Look at below article which uses HtmlAgilityPack for getting DOM data
https://www.codeproject.com/Articles/659019/Scraping-HTML-DOM-elements-using-HtmlAgilityPack-H

Related

How to look for text via XPath using Selenium for C#?

Im trying to achieve checking for the text "In stock.", via a XPath query, however my XPath variable elementInStock returns a ID, and it looks like its not finding the text "In stock.", from the URL https://www.amazon.co.uk/dp/B08H96W864/ref=twister_B08J4RCVXW?_encoding=UTF8&th=1
Would appreciate some help how to find the text "In stock." and do the Console.WriteLine("HEYYYY");
if text "In stock." not found then keep go to the while loop in mu code, my logic does not even go to the while loop if XPATH is null.
Please advise.
class Program
{
static void Main(string[] args)
{
IWebDriver driver = new ChromeDriver();
//Navigate to
driver.Navigate().GoToUrl("https://www.amazon.co.uk/dp/B08H96W864/ref=twister_B08J4RCVXW?_encoding=UTF8&th=1");
driver.Manage().Window.Maximize();
IWebElement elementInStock = driver.FindElement(By.XPath("/html/body/div[2]/div[2]/div[5]/div[4]/div[4]/div[18]"));
if (elementInStock != null)
{
Console.WriteLine("HEYYYY");
}
else if (elementInStock == null)
{
int counter = 0;
while (counter < 10)
{
Thread.Sleep(2500);
driver.Navigate().Refresh();
}
}
}
}
}
I see a few things that might be causing the issues you are seeing and a few other things that will cause issues later. I'll list them here and then below I've made some suggestions for improving the script. Finally, I've updated the script with the suggested improvements. I've tested the code and it's working correctly.
The fact that you don't wait for the element to be visible may be part of the problems you are seeing.
Your script doesn't actually check for the "In stock." text.
There's really no such thing as a null element result of a .FindElement() call. The find either works and returns the element or it doesn't work and it throws an exception. That's why your code never gets to the while loop when the element isn't found. My recommendation here would be to either find an element that you know will always be present or use .FindElements() (plural) and then check to see if there's an element in the returned collection (meaning the element exists). This will avoid unnecessary exceptions being thrown and will still accomplish the task.
Thread.Sleep() takes milliseconds as the parameter so 2500 is 2.5s. I'm assuming you intended to wait for more than 2.5s? I prefer to use TimeSpan.FromSeconds(30) to make it clearer how many seconds I'm waiting. NOTE: There are also many other From* options you may be interested in using also, e.g. TimeSpan.FromMinutes(2).
You never increment counter... meaning your script will be stuck in an infinite loop.
Your XPath is an absolute XPath which means that it starts at the /html tag. This is considered a bad practice because you end up with a very strict (and typically long) XPath which if any element along the specified chain is added/deleted/changed, your XPath will break. If you have to use XPath, instead create a relative XPath with the minimal information needed to uniquely locate the element. Best practice would be to use an ID or CSS selector instead and only use an XPath for locating an element by contained text or for more complex scenarios.
I would suggest a few changes...
In this case, we can use an ID, availability, on the parent element instead of using XPath.
<div id="availability" class="a-section a-spacing-none">
<span class="a-size-medium a-color-success">In stock.</span>
</div>
NOTE: I've cleaned up a LOT of extra whitespace from the HTML above to save space. It won't matter in this case given that I'm planning to use the ID to locate the element but I am going to add .Trim() to the text returned to remove all that extra whitespace.
Add a wait to make sure the element is visible.
Find an element that is always there and check the contained text for the desired string.
Increment counter.
Here's what the final code looks like. I ran this code and it successfully wrote "HEYYYY" to the console.
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Support.UI;
using System;
using System.Threading;
class Program
{
static void Main(string[] args)
{
IWebDriver driver = new ChromeDriver();
driver.Manage().Window.Maximize();
driver.Url = "https://www.amazon.co.uk/dp/B08H96W864/ref=twister_B08J4RCVXW?_encoding=UTF8&th=1";
int counter = 0;
while (counter < 10)
{
WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(15));
IWebElement availability = wait.Until(ExpectedConditions.ElementIsVisible(By.Id("availability")));
if (availability.Text.Trim() == "In stock.")
{
Console.WriteLine("HEYYYY");
break;
}
counter++;
Thread.Sleep(TimeSpan.FromSeconds(30));
driver.Navigate().Refresh();
}
}
}

Parsing AJAX driven page

I am trying to parse data from a page that is not filled in until after the page is finished loading. Because of this I cannot get a simple solution utilizing
while (wb.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
}
to work. I have tried using the solution found at View Generated Source (After AJAX/JavaScript) in C# but I cannot figure out how to get it to wait for the post-loading data is downloaded. Please help! The data is automatically filled into the page after it is loaded, no user interaction is required. Thanks!
I just found Waiting for WebBrowser ajax content where the answer was to use a timer....I am not sure how to fix this using a timer instead of Thread.Sleep() (which blocks the thread completely), can someone help me understand the proper way to use this with a quick sample code? Thanks again
I am looking into the suggestion of calling the AJAX myself, but I think it would be better to use the timer. I am still looking for help on the subject. Thanks.
Take a look at the page you are dealing with with Firebug for Firefox. There is a "Net" tab which will allow you to see the actual raw data of all subsequent HTTP Ajax requests that are occurring while the page is loading (but after the initial part of the page has loaded).
By looking at this data it is quite likely you will be able to find JSON or other XML data that contains exactly what you are looking for in response to a GET request containing an ID or something of that nature.
Using a 'fake' browser as mentioned in that linked post should be considered a last resort because it will yield the worst performance on your end because you will likely be downloading and parsing a lot more data than necessary.
For my situation the following solved it:
while (wb.ReadyState != WebBrowserReadyState.Complete)
Application.DoEvents();
while (wb.Document.GetElementById(elementId) != null && wb.Document.GetElementById(elementId).InnerHtml == null)
Application.DoEvents();
The second while loop waits until a specified element is populated by the AJAX. In my situation, if an invalid store # is provided in the url, it forwards to a 404-type page. The first condition verified the element still exists on the page, which it won't if it gets sent to the 404 page. The second condition waits until the element is populated.
An interesting thing I found if that after the AJAX populates the page, wb.Document.InnerText and wb.DocumentStream still contain the downloaded html. Only wb.Document.InnterHTML is updated. In my situation I am creating an HtmlAgilityPack HtmlDocument from the results. Because the DocumentStream becomes outdated, I have to recreate my document like this:
htmlDoc.LoadHtml("<html><head><title>" + wb.DocumentTitle + "</title></head><body>" + wb.Document.Body.InnerHtml + "</body></html>");
In my situation I don't care about meta/scripts in the header, so this works. If someone cared about those things, they would obviously need to adapt that line of code for their own use.

Locate only non-hidden elements using Selenium WebDriver in C#

I have a collection of records on a web page, and when a record is clicked, a 'Delete' link is displayed (actually 'unhidden' as its actually always there).
When trying to access this 'Delete' link, I am using its value.
When I use Driver.FindElement, it returns the first Delete link, even though it's hidden, and therefore can't click it (and shouldn't as it is not the right link).
So, what I basically want to do is find only non-hidden links. The code below works, but as it iterates through every Delete link I am afraid it may be inefficient.
Is there a better way?
public class DataPageModel : BasePageModel
{
private static readonly By DeleteSelector = By.CssSelector("input[value=\"Delete\"]");
private IWebElement DeleteElement
{
get
{
var elements = Driver.FindElements(DeleteSelector);
foreach (var element in elements.Where(e => e.Displayed))
{
return element;
}
Assert.Fail("Could not locate a visible Delete Element");
return null;
}
}
}
While I agree with #Torbjorn that you should be weary about where you spend your time optimizing, I do think this code is a bit inefficient.
Basically what is slowing the code down is the back and forth checking of each element to see if its displayed. To speed up the code, you need to get the element you want in one go.
Two options (both involve javascript):
jQuery
Take a look at the different ways to bring jQuery selectors to Selenium (I wrote about it here). Once you have that, you can make use of jQuery's :visible selector.
Alternatively if you know for sure the page already has jQuery loaded and you don't want to do all the extra code, you can simply use ExecuteScript:
IWebElement element = (IWebElement)driver.ExecuteScript("return $('input[value=\"Delete\"]:visible').first().get(0)");
Javascript
If you want to avoid jQuery you can just write a javascript function to do the same thing you are doing now in C#: Get all the possible elements and return the first visible one.
Then you would do something similar:
string script = //your javascript
IWebElement element = (IWebElement)driver.ExecuteScript(script);
You trade of readability with different degrees depending on which option you pick but they should all be more efficient. Of course these all require that javascript be enabled in the browser.

Equivalent to HashSet.Contains that returns HashSet or index?

I have a large list of emails that I need to check test to see if they contain a string. I only need to do this once. I originally only need to check to see if they email matched any of the emails from a list of emails.
I was using if(ListOfEmailsToRemoveHashSet.Contains(email)) { Discard(email); }
This worked great, but now I need to check for partial matches, so I am trying to invert it, but if I used the same method, I would be testing it like...
if (ListOfEmailsHashSet.Contains(badstring). Obviously that tells me which string is being found, but not which index in the hashset contains the bad string.
I can't see any way of making this work while still being fast.
Does anyone know of a function I can use that will return the HashSet of matches, the index of a matched item, or any way around this?
I only need to do this once.
If this is the case, performance shouldn't really be a consideration. Something like this should work:
if(StringsToDisallow.Any(be => email.Contains(be))) {...}
On a side note, you may want to consider using Regular Expressions rather than a straight black-list of contained strings. They'll give you a much more powerful, flexible way to find matches.
If performance does turn out to be an issue after all, you'll have to find a data structure that works better for full-text searching. It might be best to leverage an existing tool like Lucene.NET.
Just a note here, We had a program that was tasked with uploading excess of 100,000 pdf/excel/doc etc, everytime the file was uploaded an entry was made in a text file. Every Night when the program ran it would read this file, load the records and add it to the static HashSet<string> FilesVisited = new HashSet<string>(); FilesVisited.Add(reader.ReadLine());.
When the program attempted to upload a file, we had to first scan through the HashSet to see if we already worked on the file. What we found was that
if (!FilesVisited.Contains(newFilePath))... would take a lot of time and would not give us the correct results (even if the file path was in there) alternately, FilesVisited.Any(m => m.Contains(newFilePath)) was also a slow operation.
The best way we found to be fast was the traditional way of
foreach (var item in FilesVisited)
{
if (item.Contains(fileName)) {
alreadyUploded = true;
break;
}
}
Just thought I would share this....

Slow iteration through elements in WatiN

I'm writing an application with Watin. Its great, but running a performance analysis on my program, over 50% of execution time is spent looping through lists of elements.
For example:
foreach (TextField bT in browser.TextFields)
{
Is very slow.
I seem to remember seeing somewhere there is a faster way of doing this in WatiN, but unfortunately I can't find the page again.
Accessing the number of elements also seems to be slow, eg;
browser.CheckBoxes.Count
Thanks for any tips,
Chris
I think I could answer you better if I had a better idea of what you were trying to do, but I can share some observations on what I've learned with WatiN so far.
The more specific your selectors are, the faster things will go. Avoid using "browser.Elements" as that is really generic. I'm not sure that it saves much, but doing something like browser.Body.Elements throws the header elements out of the scope of things to check and may save a few calculations.
When I say "scope", consider that WatiN always starts with the entire DOM. Can you think of ways to limit the scope of elements perhaps to the text fields within the main div on your page? WatiN returns Elements and ElementCollections, each of which may have its own ElementCollection. That div probably has a specific ID, so you can do something like
var textFields = ie.Div("divId").TextFields;
Look for opportunities to be more specific, and you can use LINQ to describe what you want more clearly. For example, can you write something like:
ie.Body.TextFields.
Where(tf => !string.IsNullOrWhiteSpace(tf.ClassName) && tf.ClassName.Contains("classname")).ToList().
Foreach(tf => tf.Value = "Your Text");
I would refine that further by reducing the number of times I scan the collection by doing something like:
ie.Body.TextFields.ToList().
Foreach(tf => {
if(!string.IsNullOrWhiteSpace(tf.ClassName) && tf.ClassName.Contains("classname")) {
tf => tf.Value = "Your Text"
}
});
The "Find.By*" specifiers also help WatiN operate on the collections you want faster and are a more elegant short-hand for what I wrote above:
ie.Body.TextFields.Filter(Find.ByClass("class")).ToList().ForEach(tf => tf.Value = "Your Text");
And as a last piece of advice, this project lets you find elements using jQuery/CSS style selectors.
So, tl;dr: Narrow down the scope of what you're looking for, and be specific.
Hope that helps. I'm looking for ways to speed up my own tests.
If you really need to iterate through all text fields, there is no other way. As #Xaqron pointed out, it depends on IE. But maybe you just need to iterate through text fields of eg. specified <div/>? Finding it first, and then iterating through it's text fields would be faster.
Thanks Dahv for a really detailed answer. In my case I've sped up my tests by about 10x using a number of tricks, some similar to yours:
Refining scope as you and prostynick (in my case using Form1.TextField etc.)
First checking if browser.html matches my regex before seeing if
fields do
Using the GehSoft.PRCE RegEx wrapper - its native code regex
matching is far faster than .NET's for small haystacks. So to find a TextField I'd do:
Gehtsoft.PCRE.Regex regexString = new Gehtsoft.PCRE.Regex("[Nn]ame");
foreach (TextField bT in browser.TextFields)
{
//Skip if no match
if (!regexString.Execute(bT.Name).Success) continue;
Before I was looping on a list of regexes, then inside that i was looping on TextFields. Making the TextFields loop the top loop improved speed about 3x.

Categories