Here we go...
I am trying to create a bot to walk through various functions of a website I do not have control of. Initially, I thought it would be best to just connect to the database (MySql) that the site is tied to and make my associations there...The database is built so extensively that I can't make out where to point what to where how etc...It's beyond my (DBA)programmer casting... ;)
So, my next idea was, create a bot...simple enough right? First hurdle, not everything on the page carries an ID...bring on the loops...
Got it.
Now I'm stuck with working with my data and page response.
I'm trying to fill out part of a form and perform an AJAX search. The problem is, there is no DocumentCompleted for this. And honestly, that isn't where my problem lies. I've tried using Thread.Sleep, Timers, etc...no avail.
// The app reads categories from a csv file,
// then performs a search for the category
// Search results are only displayed if I break the foreach loop
foreach (var item in bar)
{
var document = wbBrowser.Document;
if (document != null)
{
var name = document.GetElementById("product_filter_name");
if (name != null)
{
name.SetAttribute("value", item.Key.ToString());
var buttons = document.GetElementsByTagName("button");
foreach (HtmlElement button in buttons)
{
var findSearch = button.InnerHtml.IndexOf("Search");
if (findSearch > -1)
{
button.InvokeMember("click");
}
}
}
// This where the problem starts...
// I want the AJAX to run, then perform Step two,
// but the WebBrowser doesn't return the search
// results until the end (break;)
// Step Two
var elems = document.GetElementsByTagName("tr");
foreach (HtmlElement elem in elems)
{
// find particular item in result table
}
break;
// Now the search results display!!!!
// I tried implementing a timer, Thread.Sleep,
// everything I could find via Google before
//starting Step Two, but it hasn't worked!
}
}
The actual browser control has a WebBrowser.OnDocumentCompleted event which you might need to hook into so that you can be alerted when the ajax call has returned back from the server.
Related
I need some assistance with creating a loop until null is returned on a piece of software I've written.
The software basically takes information from an API call and deserializes it to a readable format for our online service. The difficulty I am facing is that when I make an API call it is only returning 100 records of employees when the client has far more than that.
foreach (var bankRecord in bankDetailDto.Value) //
{
var deducRecords = deductions.Where(d => d.Ee_Number == bankRecord.EmployeeNumber).ToList();
if (deducRecords.Any())
{
foreach(var deducRecord in deducRecords)
{
deducRecord.Bank_Account_Number = bankRecord.BankAccountNo;
deducRecord.Bank_Account_Type = bankRecord.AccountType;
}
}
}
This is just an example of the loop I've tried to create but does not seem to work. I am under the impression i need to create a class to perhaps run a loop on the backround worker?
Apologize I have not been developing for very long.
I guess the API call has pagination with a limit is 100 per call, I think you need to use the while-loop and check if the API response still returns any object or not.
since you didn't include your API call code, I guess it's something like this
parameter.page = 1;
List<BankData> response = yourApi.yourApiAction(parameter);
while (response != null && response.Count > 0)
{
... do your logic to process the data here ...
// Increase the pagination number
parameter.page += 1;
// Call the API again to get next page data
response = yourApi.yourApiAction(parameter)
}
This is just an example code about how it should have been done.
Check your API documentation if it has pagination, and how to increase it.
I am trying to scrape a page with Selenium in C# which has several pages that I can go through by clicking a "Next" button on the page. I am usually getting the error that there is a stale element reference, which ONLY happens if I run it without breakpoints. If I go through the program step by step, it works perfectly fine. I'm assuming that Selenium is skipping over important stuff without waiting (even though I have a wait method implemented).
To the code, this is the main logic for the problem:
foundVacancies.AddRange(FindVacanciesOnPage());
const string nextBtnXPath = "//*[#id=\"ContainerResultList\"]/div/div[3]/nav/ul/li[8]/a";
if (Driver.FindElements(By.XPath(nextBtnXPath)).Count != 0)
{
while (TryClickingNextButton(nextBtnXPath))
{
foundVacancies.AddRange(FindVacanciesOnPage());
}
}
This method first gets all items on the first page and adds them to the foundVacancies list. After that, it will try to look for the "Next" button, which is not always there if there are not enough items. If it is, it will try to click it, scrape the page, and click it again until there are no pages left. This works great when debugging, but there is something very wrong with normally running.
The method for getting all items on the page, and where the error occurs:
private IEnumerable<string> FindVacanciesOnPage()
{
var vacancies = new List<string>();
var tableContainingAllVacancies = Driver.FindElement(By.XPath("//*[#id=\"ContainerResultList\"]/div/div[2]/div/ul"));
var listOfVacancies = tableContainingAllVacancies.FindElements(By.XPath(".//li/article/div[1]/a"));
foreach (var vacancy in listOfVacancies)
{
vacancies.Add(vacancy.FindElement(By.XPath(".//h2")).Text);
}
return vacancies;
}
The items are in a <ul> HTML tag and have <li> childs, which I am going through one by one, and get their inner text. The stale element error occurs in the foreach loop. I'm assuming that the web driver didn't have the time to reload the DOM, because it's working when breakpointing. However, I do have a method to wait until the page is fully loaded, which is what I use when going to the next page.
private bool TryClickingNextButton(string nextButtonXPath)
{
var nextButton = Driver.FindElement(By.XPath(nextButtonXPath));
var currentUrl = Driver.Url;
ScrollElementIntoView(nextButton);
nextButton.Click();
WaitUntilLoaded();
var newUrl = Driver.Url;
return !currentUrl.Equals(newUrl);
}
I am comparing new and old URL to determine if this was the last page. The WaitUntilLoaded method looks like this:
var wait = new WebDriverWait(Driver, TimeSpan.FromSeconds(30));
wait.Until(x => ((IJavaScriptExecutor) Driver).ExecuteScript("return document.readyState").Equals("complete"));
Oddly enough, sometimes the web driver just closes immediately after loading the first page, without any errors nor any results. I spent a lot of time debugging and searching on SO, but can't seem to find any information, because the code is working perfectly fine when breakpointing through it.
I have only tried Chrome, with and without headless mode, but I don't see that this could be a Chrome problem.
The "Next" button has the following HTML:
<a href="" data-jn-click="nextPage()" data-ng-class="{'disabled-element':currentPage === totalPages}" tabindex="0">
<span class="hidden-md hidden-sm hidden-xs">Next <span class="icon icon-pagination-single-forward"></span></span>
<span class="hidden-lg icon icon-pagination-forward-enable"></span>
</a>
I couldn't find out what data-jn-click is. I tried to just execute the JavaScript nextPage();, but that didn't do anything.
I don't have any experience in c#, so if am wrong please don't mind.
You are using findElementsand storing it to var listOfVacancies. I have referred some sites. Why don't you use ReadOnlyCollection<IWebElement>. It is better to store all elements as a List and iterate through it.
So the code becomes,
ReadOnlyCollection<IWebElement> listOfVacancies = tableContainingAllVacancies.FindElements(By.XPath(".//li/article/div[1]/a"));
If the elements that are going into listOfVacancies are being populated via an ajax call, then document.readystate won't catch that. Try using:
wait.Until(x => ((IJavaScriptExecutor) Driver).ExecuteScript("return jQuery.active").Equals("0"));
I finally found a way to solve this issue. It's dirty, but it works. I tried many different approaches to waiting until the page is fully loaded, but none worked. So I went down the dark path of Thread.Sleep, but it's not as bad as it sounds like:
private IEnumerable<string> FindVacanciesOnPage()
{
return FindVacanciesOnPage(new List<string>(), 0, 50, 15000);
}
private IEnumerable<string> FindVacanciesOnPage(ICollection<string> foundVacancies, long waitedTime, int interval, long maxWaitTime)
{
try
{
var list = Driver.FindElements(By.XPath("//*[#data-ng-bind=\"item.JobHeadline\"]"));
foreach (var vacancy in list)
{
foundVacancies.Add(vacancy.Text);
}
}
catch (Exception)
{
if (waitedTime >= maxWaitTime) throw;
Thread.Sleep(interval);
waitedTime += interval;
return FindVacanciesOnPage(foundVacancies, waitedTime, interval, maxWaitTime);
}
return foundVacancies;
}
This will try to get the items, and if there is an Exception thrown, just waits a certain amount of time until it tries again. When a specified maximum time was waited, the exception is finally thrown.
I need to perform some simultaneous webdrivers manipulation, but I am uncertain as to how to do this.
What I am asking here is:
What is the correct way to achieve this ?
What is the reason for the exception I am getting (revealed below)
After some research I ended up with:
1. The way I see people doing this (and the one I ended up using after playing with the API, before searching) is to loop over the window handles my WebDriver has at hand, and perform a switch to and out of the window handle I want to process, closing it when I am finished.
2. Selenium Grid does not seem like an option fore me - am I wrong or it is intended for parallel processing ? Since am running everything in a single computer, it will be of no use for me.
In trying the 1st option, I have the following scenario (a code sample is available below, I skipped stuff that is not relevant/repeat itself (where ever I added 3 dots:
I have a html page, with several submit buttons, stacked.
Clicking each of them will open a new browser/tab (interestingly enough, using ChromeDriver opens tabs, while FirefoxDriver opens separate windows for each.)
As a side note: I can't determine the uris of each submit beforehand (they must be determined by javascript, and at this point, let's just assume I want to handle everything knowing nothing about the client code.
Now, after looping over all the submit buttons, and issuing webElement.Click() on the corresponding elements, the tabs/windows open. The code flows to create a list of tasks to be executed, one for each new tab/window.
The problem is: since all tasks all depend upon the same instance of webdriver to switch to the window handles, seems I will need to add resource sharing locks/control. I am uncertain as whether I am correct, since I saw no mention of locks/resource access control in searching for multi-threaded web driver examples.
On the other hand, if I am able to determine the tabs/windows uris beforehand, I would be able to skip all the automation steps needed to reach this point, and then creating a webDriver instance for each thread, via Navigate().GoToUrl() would be straightforward. But this looks like a deadlock! I don't see webDriver's API providing any access to the newly opened tab/window without performing a switch. And I only want to switch if I do not have to repeat all the automation steps that lead me to the current window !
...
In any case, I keep getting the exception:
Element belongs to a different frame than the current one - switch to its containing frame to use it
at
IWebElement element = cell.FindElement
inside the ToDictionary() block.
I obviously checked that all my selectors are returning results, in chrome's console.
foreach (WebElement resultSet in resultSets)
resultSet.Click();
foreach(string windowHandle in webDriver.WindowHandles.Skip(1))
{
dataCollectionTasks.Add(Task.Factory.StartNew<List<DataTable>>(obj =>
{
List<DataTable> collectedData = new List<DataTable>();
string window = obj as string;
if (window != null)
{
webDriver.SwitchTo().Window(windowHandle);
List<WebElement> dataSets = webDriver.FindElements(By.JQuerySelector(utils.GetAppSetting("selectors.ResultSetData"))).ToList();
DataTable data = null;
for (int i = 0; i < dataSets.Count; i += 2)
{
data = new DataTable();
data.Columns.Add("Col1", typeof(string));
data.Columns.Add("Col2", typeof(string));
data.Columns.Add("Col3", typeof(string));
///...
//data set header
if (i % 2 != 0)
{
IWebElement headerElement = dataSets[i].FindElement(OpenQA.Selenium.By.CssSelector(utils.GetAppSetting("selectors.ResultSetDataHeader")));
data.TableName = string.Join(" ", headerElement.Text.Split().Take(3));
}
//data set records
else
{
Dictionary<string, string> cells = dataSets[i]
.FindElements(OpenQA.Selenium.By.CssSelector(utils.GetAppSetting("selectors.ResultSetDataCell")))
.ToDictionary(
cell =>
{
IWebElement element = cell.FindElement(OpenQA.Selenium.By.CssSelector(utils.GetAppSetting("selectors.ResultSetDataHeaderColumn")));
return element == null ? string.Empty : element.Text;
},
cell =>
{
return cell == null ? string.Empty : cell.Text;
});
string col1Value, col2Value, col3Value; //...
cells.TryGetValue("Col1", out col1Value);
cells.TryGetValue("Col2", out col2Value);
cells.TryGetValue("Col3", out col3Value);
//...
data.Rows.Add(col1Value, col2Value, col3Value /*...*/);
}
}
collectedData.Add(data);
}
webDriver.SwitchTo().Window(mainWindow);
webDriver.Close();
return collectedData;
}, windowHandle));
} //foreach
Task.WaitAll(dataCollectionTasks.ToArray());
foreach (Task<List<DataTable>> dataCollectionTask in dataCollectionTasks)
{
results.AddRange(dataCollectionTask.Result);
}
return results;
I am trying to get a list of all users in our instance of Desire2Learn using a looping structure through the bookmarks however for some reason it continuously loops and doesn't return. When I debug it it is showing massive amounts of users (far more than we have in the system as shown by the User Management Tool. A portion of my code is here:
public async Task<List<UserData>> GetAllUsers(int pages = 0)
{
//List<UserData> users = new List<UserData>();
HashSet<UserData> users = new HashSet<UserData>();
int pageCount = 0;
bool getMorePages = true;
var response = await Get<PagedResultSet<UserData>>("/d2l/api/lp/1.4/users/");
var qParams = new Dictionary<string, string>();
do
{
qParams["bookmark"] = response.PagingInfo.Bookmark;
//users = users.Concat(response.Items).ToList<UserData>();
users.UnionWith(response.Items);
response = await Get<PagedResultSet<UserData>>("/d2l/api/lp/1.4/users/", qParams);
if (pages != 0)
{
pageCount++;
if (pageCount >= pages)
{
getMorePages = false;
}
}
}
while (response.PagingInfo.HasMoreItems && getMorePages);
return users.ToList();
}
I originally was using the List container that is commented out but just switched to the HashSet to see if I could notice if duplicates where being added.
It's fairly simple, but for whatever reason it's not working. The Get<PagedResultSet<UserData>>() method simply wraps the HTTP request logic. We set the bookmark each time and send it on.
The User Management Tool indicates there are 39,695 users in the system. After running for just a couple of minutes and breaking on the UnionWith in the loop I'm showing that my set has 211,800 users.
What am I missing?
It appears that you’ve encountered a defect in this API. The next course of action is for you to have your institution’s Approved Support Contact open an Incident through the Desire2Learn Helpdesk. Please make mention in the Incident report that Sarah-Beth Bianchi is aware of the issue, and I will work with our Support team to direct this issue appropriately.
I have a custom sharepoint (2007) list (named testlist) on which I attached a test workflow (built with sharepoint designer 2007 and named testwf), which only task defined in the 'Actions' section at 'Step 1' is to wait until april 2014.
When I add a new item to the testlist the testwf will start and, when I switch to the grid view, the item has the field "testwf" as running.
Now I need to access the workflow associated with the item and then "complete" this task via code by changing its status but, using the following code, I always get the item.Tasks list empty (but I can see that the internal variable m_allTaskListTasks has 1 element).
using (SPSite site = new SPSite("http://mysp"))
{
site.AllowUnsafeUpdates = true;
SPWeb web = site.OpenWeb();
web.AllowUnsafeUpdates = true;
foreach (SPList list in web.Lists)
{
if (list.Title != "testlist") continue;
foreach (SPListItem item in list.Items)
{
item.Web.AllowUnsafeUpdates = true;
if(item.Tasks.Count > 0)
//do work
}
}
}
Maybe I'm missing something...
I use this code to access my workflowtasks:
Guid taskWorkflowInstanceID = new Guid(item["WorkflowInstanceID"].ToString());
SPWorkflow workflow = item.Workflows[taskWorkflowInstanceID];
// now you can access the workflows tasks
SPTask task = workflow.Tasks[item.UniqueId];
Cross-posted question.
#petauro, have you made any headway on this? I can corroborate #moontear's answer based on the following code that I have used with success in the past:
...
// get workflow tasks for SPListItem object item
if (item != null && item.Workflows != null && item.Workflows.Count > 0)
{
try
{
var workflows = site.WorkflowManager.GetItemActiveWorkflows(item);
foreach (SPWorkflow workflow in workflows)
{
// match on some indentifiable attribute of your custom workflow
// the history list title is used below as an example
if (workflow.ParentAssociation.HistoryListTitle.Equals(Constants.WORKFLOW_HISTORY_LIST_TITLE))
{
var workflowTasks = workflow.Tasks;
if (workflowTasks != null && workflowTasks.Count > 0)
{
// do work on the tasks
}
}
}
}
catch
{
// handle error
}
}
...
While only slightly different from the code you posted in your latest comment, see if it helps.
Another minor point: are there multiple instances of lists titled "testlist" within your SPWeb? If not, why iterate over web.Lists? Just get the one list directly and avoid some superfluous CPU cycles: SPWeb.GetList()
You have to go differently about this. You need to get the workflow task list and retrieve your task from there and finish it.
First you would need to check whether a workflow is running on your item: if (item.Workflows > 0) from there you could iterate through all the workflow instances on the list item, get the SPWorkflowAssociation and the associated task and history list. From there you would only need to find the task you are looking for in the associated task list.