Multi-threaded C# Selenium WebDriver automation with Uris not known beforehand - c#

I need to perform some simultaneous webdrivers manipulation, but I am uncertain as to how to do this.
What I am asking here is:
What is the correct way to achieve this ?
What is the reason for the exception I am getting (revealed below)
After some research I ended up with:
1. The way I see people doing this (and the one I ended up using after playing with the API, before searching) is to loop over the window handles my WebDriver has at hand, and perform a switch to and out of the window handle I want to process, closing it when I am finished.
2. Selenium Grid does not seem like an option fore me - am I wrong or it is intended for parallel processing ? Since am running everything in a single computer, it will be of no use for me.
In trying the 1st option, I have the following scenario (a code sample is available below, I skipped stuff that is not relevant/repeat itself (where ever I added 3 dots:
I have a html page, with several submit buttons, stacked.
Clicking each of them will open a new browser/tab (interestingly enough, using ChromeDriver opens tabs, while FirefoxDriver opens separate windows for each.)
As a side note: I can't determine the uris of each submit beforehand (they must be determined by javascript, and at this point, let's just assume I want to handle everything knowing nothing about the client code.
Now, after looping over all the submit buttons, and issuing webElement.Click() on the corresponding elements, the tabs/windows open. The code flows to create a list of tasks to be executed, one for each new tab/window.
The problem is: since all tasks all depend upon the same instance of webdriver to switch to the window handles, seems I will need to add resource sharing locks/control. I am uncertain as whether I am correct, since I saw no mention of locks/resource access control in searching for multi-threaded web driver examples.
On the other hand, if I am able to determine the tabs/windows uris beforehand, I would be able to skip all the automation steps needed to reach this point, and then creating a webDriver instance for each thread, via Navigate().GoToUrl() would be straightforward. But this looks like a deadlock! I don't see webDriver's API providing any access to the newly opened tab/window without performing a switch. And I only want to switch if I do not have to repeat all the automation steps that lead me to the current window !
...
In any case, I keep getting the exception:
Element belongs to a different frame than the current one - switch to its containing frame to use it
at
IWebElement element = cell.FindElement
inside the ToDictionary() block.
I obviously checked that all my selectors are returning results, in chrome's console.
foreach (WebElement resultSet in resultSets)
resultSet.Click();
foreach(string windowHandle in webDriver.WindowHandles.Skip(1))
{
dataCollectionTasks.Add(Task.Factory.StartNew<List<DataTable>>(obj =>
{
List<DataTable> collectedData = new List<DataTable>();
string window = obj as string;
if (window != null)
{
webDriver.SwitchTo().Window(windowHandle);
List<WebElement> dataSets = webDriver.FindElements(By.JQuerySelector(utils.GetAppSetting("selectors.ResultSetData"))).ToList();
DataTable data = null;
for (int i = 0; i < dataSets.Count; i += 2)
{
data = new DataTable();
data.Columns.Add("Col1", typeof(string));
data.Columns.Add("Col2", typeof(string));
data.Columns.Add("Col3", typeof(string));
///...
//data set header
if (i % 2 != 0)
{
IWebElement headerElement = dataSets[i].FindElement(OpenQA.Selenium.By.CssSelector(utils.GetAppSetting("selectors.ResultSetDataHeader")));
data.TableName = string.Join(" ", headerElement.Text.Split().Take(3));
}
//data set records
else
{
Dictionary<string, string> cells = dataSets[i]
.FindElements(OpenQA.Selenium.By.CssSelector(utils.GetAppSetting("selectors.ResultSetDataCell")))
.ToDictionary(
cell =>
{
IWebElement element = cell.FindElement(OpenQA.Selenium.By.CssSelector(utils.GetAppSetting("selectors.ResultSetDataHeaderColumn")));
return element == null ? string.Empty : element.Text;
},
cell =>
{
return cell == null ? string.Empty : cell.Text;
});
string col1Value, col2Value, col3Value; //...
cells.TryGetValue("Col1", out col1Value);
cells.TryGetValue("Col2", out col2Value);
cells.TryGetValue("Col3", out col3Value);
//...
data.Rows.Add(col1Value, col2Value, col3Value /*...*/);
}
}
collectedData.Add(data);
}
webDriver.SwitchTo().Window(mainWindow);
webDriver.Close();
return collectedData;
}, windowHandle));
} //foreach
Task.WaitAll(dataCollectionTasks.ToArray());
foreach (Task<List<DataTable>> dataCollectionTask in dataCollectionTasks)
{
results.AddRange(dataCollectionTask.Result);
}
return results;

Related

Selenium: Stale Element Reference (works fine when debugging)

I am trying to scrape a page with Selenium in C# which has several pages that I can go through by clicking a "Next" button on the page. I am usually getting the error that there is a stale element reference, which ONLY happens if I run it without breakpoints. If I go through the program step by step, it works perfectly fine. I'm assuming that Selenium is skipping over important stuff without waiting (even though I have a wait method implemented).
To the code, this is the main logic for the problem:
foundVacancies.AddRange(FindVacanciesOnPage());
const string nextBtnXPath = "//*[#id=\"ContainerResultList\"]/div/div[3]/nav/ul/li[8]/a";
if (Driver.FindElements(By.XPath(nextBtnXPath)).Count != 0)
{
while (TryClickingNextButton(nextBtnXPath))
{
foundVacancies.AddRange(FindVacanciesOnPage());
}
}
This method first gets all items on the first page and adds them to the foundVacancies list. After that, it will try to look for the "Next" button, which is not always there if there are not enough items. If it is, it will try to click it, scrape the page, and click it again until there are no pages left. This works great when debugging, but there is something very wrong with normally running.
The method for getting all items on the page, and where the error occurs:
private IEnumerable<string> FindVacanciesOnPage()
{
var vacancies = new List<string>();
var tableContainingAllVacancies = Driver.FindElement(By.XPath("//*[#id=\"ContainerResultList\"]/div/div[2]/div/ul"));
var listOfVacancies = tableContainingAllVacancies.FindElements(By.XPath(".//li/article/div[1]/a"));
foreach (var vacancy in listOfVacancies)
{
vacancies.Add(vacancy.FindElement(By.XPath(".//h2")).Text);
}
return vacancies;
}
The items are in a <ul> HTML tag and have <li> childs, which I am going through one by one, and get their inner text. The stale element error occurs in the foreach loop. I'm assuming that the web driver didn't have the time to reload the DOM, because it's working when breakpointing. However, I do have a method to wait until the page is fully loaded, which is what I use when going to the next page.
private bool TryClickingNextButton(string nextButtonXPath)
{
var nextButton = Driver.FindElement(By.XPath(nextButtonXPath));
var currentUrl = Driver.Url;
ScrollElementIntoView(nextButton);
nextButton.Click();
WaitUntilLoaded();
var newUrl = Driver.Url;
return !currentUrl.Equals(newUrl);
}
I am comparing new and old URL to determine if this was the last page. The WaitUntilLoaded method looks like this:
var wait = new WebDriverWait(Driver, TimeSpan.FromSeconds(30));
wait.Until(x => ((IJavaScriptExecutor) Driver).ExecuteScript("return document.readyState").Equals("complete"));
Oddly enough, sometimes the web driver just closes immediately after loading the first page, without any errors nor any results. I spent a lot of time debugging and searching on SO, but can't seem to find any information, because the code is working perfectly fine when breakpointing through it.
I have only tried Chrome, with and without headless mode, but I don't see that this could be a Chrome problem.
The "Next" button has the following HTML:
<a href="" data-jn-click="nextPage()" data-ng-class="{'disabled-element':currentPage === totalPages}" tabindex="0">
<span class="hidden-md hidden-sm hidden-xs">Next <span class="icon icon-pagination-single-forward"></span></span>
<span class="hidden-lg icon icon-pagination-forward-enable"></span>
</a>
I couldn't find out what data-jn-click is. I tried to just execute the JavaScript nextPage();, but that didn't do anything.
I don't have any experience in c#, so if am wrong please don't mind.
You are using findElementsand storing it to var listOfVacancies. I have referred some sites. Why don't you use ReadOnlyCollection<IWebElement>. It is better to store all elements as a List and iterate through it.
So the code becomes,
ReadOnlyCollection<IWebElement> listOfVacancies = tableContainingAllVacancies.FindElements(By.XPath(".//li/article/div[1]/a"));
If the elements that are going into listOfVacancies are being populated via an ajax call, then document.readystate won't catch that. Try using:
wait.Until(x => ((IJavaScriptExecutor) Driver).ExecuteScript("return jQuery.active").Equals("0"));
I finally found a way to solve this issue. It's dirty, but it works. I tried many different approaches to waiting until the page is fully loaded, but none worked. So I went down the dark path of Thread.Sleep, but it's not as bad as it sounds like:
private IEnumerable<string> FindVacanciesOnPage()
{
return FindVacanciesOnPage(new List<string>(), 0, 50, 15000);
}
private IEnumerable<string> FindVacanciesOnPage(ICollection<string> foundVacancies, long waitedTime, int interval, long maxWaitTime)
{
try
{
var list = Driver.FindElements(By.XPath("//*[#data-ng-bind=\"item.JobHeadline\"]"));
foreach (var vacancy in list)
{
foundVacancies.Add(vacancy.Text);
}
}
catch (Exception)
{
if (waitedTime >= maxWaitTime) throw;
Thread.Sleep(interval);
waitedTime += interval;
return FindVacanciesOnPage(foundVacancies, waitedTime, interval, maxWaitTime);
}
return foundVacancies;
}
This will try to get the items, and if there is an Exception thrown, just waits a certain amount of time until it tries again. When a specified maximum time was waited, the exception is finally thrown.

Selenium stale element exception (found when running tests and not while debugging)

I am getting stale element exception when I run the code. However while debugging I do not get this exception.
Here is my piece of code. Can anybody help me ? thanks
public static bool CheckListFilterResult(IList<IWebElement> gridColumns, IList<IWebElement> gridRows, string filterColumn, List<string> filters)
{
bool checkResult = false;
{
int filterColumnIndex = GetColumnIndex(gridColumns, filterColumn);
if (gridRows.Count > 0 )
{
foreach (IWebElement row in gridRows)
{
TestManager.Doc.Step("before bool match");
bool filterMatch = filters.Contains(row.FindElements(By.TagName("td"))[filterColumnIndex].Text.Trim());
if (filterMatch)
{
checkResult = true;
}
else
{
checkResult = false;
break;
}
}
}
}
return checkResult;
}
}
From looking at your code, that line is the first line where you access something from the rows collection. My guess is that it has something to do with that. Did you pull the rows collection, then apply a filter, then call this function? If so, that's probably the problem. You need to apply the filter, then pull the rows collection, then call the function.
It could be that your IWebElement row was updated by javascript while you were running your selenium code.
As explained here: http://docs.seleniumhq.org/exceptions/stale_element_reference.jsp, Stale Element Reference Exception happens when the object on the UI may have been refreshed but you are trying to access the same object. So even if the row is visible to the user, it may have been updated/replaced by javascript, then it is a different object.
The reason why it happens only occassionaly but not always could be that when you run in debug mode, you are stopping at each line of your selenium code, but javascript code doesnt stop, so the element on the UI was already refreshed, then your selenium code gets a fresh copy of the IWebElement row.
There are several possible ways of fixing this problem.
try to catch StaleElementReferenceException, if it happens, then redo whatever you are trying to do.
you can do: wait.until(ExpectedConditions.stalenessOf(row));
Hope it helps

Getting a list of all users via Valence

I am trying to get a list of all users in our instance of Desire2Learn using a looping structure through the bookmarks however for some reason it continuously loops and doesn't return. When I debug it it is showing massive amounts of users (far more than we have in the system as shown by the User Management Tool. A portion of my code is here:
public async Task<List<UserData>> GetAllUsers(int pages = 0)
{
//List<UserData> users = new List<UserData>();
HashSet<UserData> users = new HashSet<UserData>();
int pageCount = 0;
bool getMorePages = true;
var response = await Get<PagedResultSet<UserData>>("/d2l/api/lp/1.4/users/");
var qParams = new Dictionary<string, string>();
do
{
qParams["bookmark"] = response.PagingInfo.Bookmark;
//users = users.Concat(response.Items).ToList<UserData>();
users.UnionWith(response.Items);
response = await Get<PagedResultSet<UserData>>("/d2l/api/lp/1.4/users/", qParams);
if (pages != 0)
{
pageCount++;
if (pageCount >= pages)
{
getMorePages = false;
}
}
}
while (response.PagingInfo.HasMoreItems && getMorePages);
return users.ToList();
}
I originally was using the List container that is commented out but just switched to the HashSet to see if I could notice if duplicates where being added.
It's fairly simple, but for whatever reason it's not working. The Get<PagedResultSet<UserData>>() method simply wraps the HTTP request logic. We set the bookmark each time and send it on.
The User Management Tool indicates there are 39,695 users in the system. After running for just a couple of minutes and breaking on the UnionWith in the loop I'm showing that my set has 211,800 users.
What am I missing?
It appears that you’ve encountered a defect in this API. The next course of action is for you to have your institution’s Approved Support Contact open an Incident through the Desire2Learn Helpdesk. Please make mention in the Incident report that Sarah-Beth Bianchi is aware of the issue, and I will work with our Support team to direct this issue appropriately.

How to separate parallel requests?

I'll try to explain the issue with a simplified console application example, however the real project is a ASP.NET MVC3 application.
Having the following tables:
imagine the following scenario:
user creates a report (a line in TestReport, where Text is the report string content, and Ready is a bool flag, saying, if the report is ready to be processed); by default Ready is set to false, i.e. not ready.
user wants the report to be processed, so he submits it; Ready is set to true here.
The system gives an opportunity to recall the report back, if it has not been processed yet. So, when the report is recalled, Ready is set to false back. On the contrary, when the report is processed, a line in TestReportRef, referencing report by its Id, is created.
Now imagine that at one and the same moment
user wants to recall the report;
the report is added to the process list;
As soon as this can happen simultaneously, errors may occur. That is the report will have Ready == false and it'll be referenced in TestReportRef.
Here is a simple console example of how this may happen:
var dc = new TestDataContext('my connection string');
dc.TestReport.InsertOnSubmit(new TestReport
{
Text = "My report content",
Ready = true //ready at once
});
dc.SubmitChanges();
Action recallReport = () =>
{
var _dc = new TestDataContext(cs);
var report = _dc.TestReport.FirstOrDefault(t => t.Ready);
if (report != null && !report.TestReportRef.Any())
{
Thread.Sleep(1000);
report.Ready = false;
_dc.SubmitChanges();
}
};
Action acceptReport = () =>
{
var _dc = new TestDataContext(cs);
var report = _dc.TestReport.FirstOrDefault(t => t.Ready);
if (report != null && !report.TestReportRef.Any())
{
Thread.Sleep(1000);
_dc.TestReportRef.InsertOnSubmit(new TestReportRef
{
FK_ReportId = report.Id
});
_dc.SubmitChanges();
}
};
var task1 = new Task(recallReport);
var task2 = new Task(acceptReport);
task1.Start();
task2.Start();
task1.Wait();
task2.Wait();
foreach (var t in dc.TestReport)
{
Console.WriteLine(string.Format("{0}\t{1}\t{2}", t.Id, t.Text, t.Ready));
}
foreach (var t in dc.TestReportRef)
{
Console.WriteLine("ref id:\t" + t.FK_ReportId);
}
Thread.Sleep(1000); is added to be ensure, that tasks will check one and the same situation.
The given example may sound awkward, however, I hope, it should explain the issue I'm dealing with.
How can I avoid this? Making the repository singleton doesn't seem to be a good idea. Shall I use some shared mutex (one for all web requests) to separate write-operations only?
Or is there a pattern I should use in this kind of scenario?
This is only a simplified example of one of the scenarios I have. However, there are several scenarios in which it may run into a similar discrepancy. The best thing would be to make this kind of intersection impossible, I guess.
Why don't add a version column on the Report table? Task starts by tracking current version,when task end, if the version is the same that the tracked one, operation is ok, otherwise fail. If operation appear ok, update the version to version +1. This is a sort of optimistic lock; that implicitly suppose that conflicts may occur, but they are not so frequent.
UPDATE
If you are using linqto sql maybe you can have a check at the parameter UpdateCheck [Column(UpdateCheck=UpdateCheck.Always)]
This can be useful to handle concurrency in your case.

Stuck on implementation of WebBrowser Control and working with AJAX responses

Here we go...
I am trying to create a bot to walk through various functions of a website I do not have control of. Initially, I thought it would be best to just connect to the database (MySql) that the site is tied to and make my associations there...The database is built so extensively that I can't make out where to point what to where how etc...It's beyond my (DBA)programmer casting... ;)
So, my next idea was, create a bot...simple enough right? First hurdle, not everything on the page carries an ID...bring on the loops...
Got it.
Now I'm stuck with working with my data and page response.
I'm trying to fill out part of a form and perform an AJAX search. The problem is, there is no DocumentCompleted for this. And honestly, that isn't where my problem lies. I've tried using Thread.Sleep, Timers, etc...no avail.
// The app reads categories from a csv file,
// then performs a search for the category
// Search results are only displayed if I break the foreach loop
foreach (var item in bar)
{
var document = wbBrowser.Document;
if (document != null)
{
var name = document.GetElementById("product_filter_name");
if (name != null)
{
name.SetAttribute("value", item.Key.ToString());
var buttons = document.GetElementsByTagName("button");
foreach (HtmlElement button in buttons)
{
var findSearch = button.InnerHtml.IndexOf("Search");
if (findSearch > -1)
{
button.InvokeMember("click");
}
}
}
// This where the problem starts...
// I want the AJAX to run, then perform Step two,
// but the WebBrowser doesn't return the search
// results until the end (break;)
// Step Two
var elems = document.GetElementsByTagName("tr");
foreach (HtmlElement elem in elems)
{
// find particular item in result table
}
break;
// Now the search results display!!!!
// I tried implementing a timer, Thread.Sleep,
// everything I could find via Google before
//starting Step Two, but it hasn't worked!
}
}
The actual browser control has a WebBrowser.OnDocumentCompleted event which you might need to hook into so that you can be alerted when the ajax call has returned back from the server.

Categories