I have a list Rows which holds 10 different records. I am looping this list in C# console app and inserting values to another list but it only picks first record and inserts it 10 times to new list.
When I debug, unique values are shown in the loop but they are not being assigned to left variable.
List<Job> jobList=new List<Job>();
foreach (var row in rows)
{
Job job = new Job();
job.Title = row.SelectSingleNode("//h2[#class='jobtitle']").ChildNodes[1].Attributes["title"].Value;
job.summary = row.SelectSingleNode("//span[#class='summary']").InnerText
jobList.add(job);
}
Any idea, what is happening?
I also used garbage collector but still no improvement:
job = null;
GC.Collect();
GC.WaitForPendingFinalizers();
Here is updated code after #Andrew suggestion but it didn't work. Right side holds updated values but they are not being assigned to left side variables.
foreach (var row in rows)
{
try
{
var job = new Job();
var title = row.SelectSingleNode("//h2[#class='jobtitle']").ChildNodes[1].Attributes["title"].Value;
var company = row.SelectSingleNode("//span[#class='company']").InnerText.Replace("\n", "").Replace("\r", "");
var location = row.SelectSingleNode("//span[#class='location']").InnerText.Replace("\n", "").Replace("\r", "");
var summary = row.SelectSingleNode("//span[#class='summary']").InnerText.Replace("\n", "").Replace("\r", "");
job.Title = title;
job.Company = company;
job.Location = location;
job.Summary = summary;
jobList.Add(job);
job = null;
GC.Collect();
GC.WaitForPendingFinalizers();
counter++;
Status("Page# " + pageNumber.ToString() + " : Record# " + counter + " extracted");
}
catch (Exception)
{
AppendRecords(jobList);
jobList.Clear();
}
//save file
}
Hi You don't tell us what the rows variable relates to, but I assume these are nodes in a single XmlDocument. The XPath expressions you are using to extract values from these nodes is incorrect, because they will always navigate to the same node in the document irrespective of the current row node.
Here's a simple example that demonstrates the problem:-
static void Main(string[] args)
{
XmlDocument x = new XmlDocument();
x.LoadXml(#"<rows> <row><bla><h2>bob1</h2></bla></row> <row><bla><h2>bob2</h2></bla></row> </rows>");
var rows = x.GetElementsByTagName("row");
foreach (XmlNode row in rows)
{
var h2 = row.SelectSingleNode("//h2").ChildNodes[0].Value;
Console.WriteLine(h2);
}
}
The output from this will be
bob1
bob1
Not what you were expecting? Have a play with the example in Dot Net Fiddle. Take another look at your XPath expression. Your current expression //h2 is saying "give me all h2 elements in the document irrespective of the current node". Whereas .//h2 would give you the h2 elements that are descendants of the current row node, which is probably what you need.
Related
Hey guys i have a problem with selenium what i do is:
navigate to a page with support tickets, switch to the iframe which shows those, find the container of all the ticket items (XPath "//[#id='task_table']/tbody") and then select all the entries in the list of tickets (XPath "//[contains(#id, 'row_task_')]")
now the problem: i'm iterating trough the list of 20 items i got back (tried with foreach loop also) to select subelements of those entries, to get the ticket number for example, which works for the first item, but after that always gives back the same values as in the first item - if i print the innerHTML or Text of the whole element however, i see that the correct element is selected, holding the corresponding values in the element.
can someone tell me why i'm always getting the same values from the fist element in the list?
private void grabTicketData()
{
var docUrl = #"https://it4you.xyz/task_list_do#someParameters";
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.AddArgument("--headless");
IWebDriver driver = new ChromeDriver(chromeOptions); // chromeOptions
driver.Navigate().GoToUrl(docUrl);
driver.Manage().Timeouts().ImplicitWait = new TimeSpan(10000);
System.Threading.Thread.Sleep(3000);
driver.SwitchTo().Frame("gsft_main");
var webAppIframe = driver.FindElement(By.XPath("//*[#id='task_table']/tbody"));
var elements = webAppIframe.FindElements(By.XPath(#"//*[contains(#id, 'row_task_')]"));
var newLstTickets = new ObservableCollection<Ticket>();
for (int i = 0; i <= (elements.Count - 1); i++)
{
Debug.WriteLine(elements[i].Text);
//var itemInnerHtml = elements[i].GetAttribute("innerHTML");
var Id = elements[i].FindElement(By.XPath("//td[3]/a")).Text;
var Prio = elements[i].FindElement(By.XPath("//td[4]")).Text;
var Status = elements[i].FindElement(By.XPath("//td[5]")).Text;
var DelegatedTo = elements[i].FindElement(By.XPath("//td[7]")).Text;
var Subject = elements[i].FindElement(By.XPath("//td[8]")).Text;
var Type = elements[i].FindElement(By.XPath("//td[9]")).Text;
Debug.WriteLine("#####ID:" + Id + " ---- Prio:" + Prio + " -- Status:" + Status + " - DelegatedTo:" + DelegatedTo + " - Subject:" + Subject + " - Type:" + Type);
newLstTickets.Add(new Ticket(Id, Prio, Status, DelegatedTo, Subject, Type));
}
driver.Quit();
}
Thanks in advance! :)
Well, i got it working, kinda confused why this is this way but it works i had expected to get the single elements i've selected back (which is actually true) but for some reason if i navigate the DOM again by XPath, it looks like that navigation is done on the table I've selected before and not in the selected table entry...
The solution is pretty straight forward, just start the DOM navigation one step higher and select the containing element first with XPath..
var Id = elements[i].FindElement(By.XPath("//*[contains(#id, 'row_task_')][" + (i+1).ToString() + "]/td[3]/a")).Text;
I am having trouble figuring out how to open a link in a new tab using selenium Webdriver. I am getting stale exceptions in the loops because the pages are not correct after the first iteration. So my idea is to open the link in a new tab, do all the operations I want to do on that tab, and switch back to the old tab to continue the loop, but I am not too sure how to open these tabs and manage them.
string year = this.yearTextBox2.Text;
string semester = this.semesterTextBox2.Text;
int numCourses = (int)this.numEnrollments.Value;
int count = 0;
string URL = GetURL(year, semester, "index");
_driver.Navigate().GoToUrl(URL);
//var result = _driver.FindElement(By.XPath("//*[#id=\"uu-skip-target\"]/div[2]/div"));
var results = _driver.FindElements(By.CssSelector(".btn.btn-light.btn-block"));
// Loop through each department
foreach (var r in results)
{
// Make sure not to include the letter link
// Click on this department and get the list of all courses
r.Click();
var result2 = _driver.FindElement(By.Id("class-details"));
var results2 = result2.FindElements(By.XPath("./*[#class=\"class-info card mt-3\"]"));
var courseCount = 0;
// Loop through each course in the department
foreach (var r2 in results2)
{
// Stop the process once reached the amount of courses needed to be scraped
if (count >= numCourses)
break;
Course c = new Course();
c.year = year;
c.semester = semester;
var header = r2.FindElement(By.TagName("h3"));
if (header != null)
{
// Gets the course (CS 2420)
string courseNum = header.Text.Split('-')[0].Trim().ToUpper();
string[] depAndNum = courseNum.Split(' ');
// Separate department and number
c.department = depAndNum[0];
c.number = depAndNum[1];
// Get the course title
string text = header.Text.Split('-')[1].Trim();
c.title = text.Substring(4);
// Check if the course is a lecuture/seminar, if not then continue.
var list = result2.FindElement(By.CssSelector(".row.breadcrumb-list.list-unstyled"));
if (CourseIsLecture(list.FindElements(By.TagName("li"))))
{
c.count = courseCount;
GetCourseInformation(r2, c);
}
else
{
courseCount++;
continue;
}
}
// Increment the course count on this department page
courseCount++;
// Increment total course count
count++;
}
}
You can perform a click while holding the control key to force open the link in a new tab. You can use actions API for the same.
Actions action = new Actions(webDriver);
action.KeyDown(Keys.LeftControl).Click(r).KeyUp(Keys.LeftControl).Build().Perform();
However, I believe you might still get a stale reference exception when you come back to tab 0 and continue looping over results collection.If this happens to be the case, you can retrieve the count first and convert your foreach loop to a while/for loop and lookup your results collection every time inside while/for loop and then use results[i] to process that element further.
Another option could be to wrap your loop in a retry block e.g. using Polly framework and lookup results collection again in case of stale reference and retry the entire thing.
I am trying to identify text nodes from an HTML text having a format like as below
sample text 1 : <strong>[Hot Water][Steam][Electric]</strong> Preheating Coil
sample text 2 : <b><span>[Steam] [Natural Gas Fired] [Electric] [Steam to steam]</span></b><span> Humidifier</span><br>
using the below code
public static string IdentifyHTMLTagsAndRemove(string htmlText)
{
_ = htmlText ?? throw new ArgumentNullException(nameof(htmlText));
var document = new HtmlDocument();
document.LoadHtml(htmlText);
var rootNode = document.DocumentNode;
// get first and last text nodes
var nonEmptyTextNodes = rootNode.SelectNodes("//text()[not(self::text())]") ?? new HtmlNodeCollection(null);
//if (nonEmptyTextNodes.Count == 0)
//{
// return rootNode.OuterHtml;
//}
if (nonEmptyTextNodes.Count > 0)
{
var firstTextNode = nonEmptyTextNodes[0];
var lastTextNode = nonEmptyTextNodes[^1];
// get all br nodes in html string,
var breakNodes = rootNode.SelectNodes("//br") ?? new HtmlNodeCollection(null);
var lastTextNodeLengthIndex = lastTextNode.OuterStartIndex + lastTextNode.OuterLength;
foreach (var breakNode in breakNodes)
{
if (breakNode == null)
continue;
// check index of br nodes against first and last text nodes
// and remove br nodes that sit outside text nodes
if (breakNode.OuterStartIndex <= firstTextNode.OuterStartIndex
|| breakNode.OuterStartIndex >= lastTextNodeLengthIndex)
{
breakNode.Remove();
}
}
}
return rootNode.OuterHtml;
}
But it is constantly failing here
var nonEmptyTextNodes =
rootNode.SelectNodes("//text()[not(self::text())]") ?? new
HtmlNodeCollection(null);
and nonEmptyTextNodes giving count as zero, I am unsure where I am doing wrong with the above code.
Could anyone please point me in the right direction? Many thanks in advance.
In addition to Siebe's answer, I'd also like to point out an inefficiency in the code that trims start/end BR tags. If you look at the HtmlAgilityPack code for HtmlNode operations, you'll see that whenever nodes are removed, the SetChanged() method is called on the parent (and its parent, all the way up). The next time you check the start/end indexes of anything in the tree, they need to be recalculated. So this code could be made to run much faster if you instead just create a temporary list of all the nodes to be removed, then remove them after they've all been identified.
var lastTextNodeLengthIndex = lastTextNode.OuterStartIndex + lastTextNode.OuterLength;
var breakNodesToRemove = rootNode.SelectNodes("//br")?.Where(node => node.OuterStartIndex <= firstTextNode.OuterStartIndex || node.OuterStartIndex >= lastTextNodeLengthIndex).ToList();
breakNodesToRemove?.ForEach(a => a.Remove());
reference: https://github.com/zzzprojects/html-agility-pack/blob/master/src/HtmlAgilityPack.Shared/HtmlNode.cs
Not sure what you are trying to achieve with
//text()[not(self::text())]
It tries to select text()-nodes that are not text()-nodes. So nothing will be found. If you just use
//text()
Will select all text()-nodes
I need to check a filter function on a table.
This filter is only on the first cell of each row and I'm trying to figure out how to get all those values...
I tried with something like
public bool CheckSearchResults(HtmlControl GridTable, string FilterTxt)
{
List<string> Elements = new List<string>();
foreach (HtmlCell cell in GridTable.GetChildren())
{
Elements.Add(cell.FilterProperties["title"]);
}
List<string> Results = Elements.FindAll(l => l.Contains(FilterTxt));
return Results.Count == Elements.Count;
}
but I get stuck at the foreach loop...
maybe there's a simply way with linq, but i don't know it so much
edit:
all the cells i need have the same custom html tag.
with this code i should get them all, but i don't know how to iterate
HtmlDocument Document = this.UIPageWindow.UIPageDocument;
HtmlControl GridTable = this.UIPageWindow.UIPageDocument.UIPageGridTable;
HtmlCell Cells = new HtmlCell(GridTable);
Cells.FilterProperties["custom_control"] = "firstCellOfRow";
also because there's no GetEnumerator function or query models for HtmlCell objects, which are part of Microsoft.VisualStudio.TestTools.UITesting.HtmlControl library -.-
edit2:
i found this article and i tried this
public bool CheckSearchResults(string FilterTxt)
{
HtmlDocument Document = this.UIPageWindow.UIPageDocument;
HtmlControl GridTable = this.UIPageWindow.UIPageDocument.UIPageGridTable;
HtmlRow rows = new HtmlRow(GridTable);
rows.SearchProperties[HtmlRow.PropertyNames.Class] = "ui-widget-content jqgrow ui-row-ltr";
HtmlControl cells = new HtmlControl(rows);
cells.SearchProperties["custom_control"] = "firstCellOfRow";
UITestControlCollection collection = cells.FindMatchingControls();
List<string> Elements = new List<string>();
foreach (UITestControl elem in collection)
{
HtmlCell cell = (HtmlCell)elem;
Elements.Add(cell.GetProperty("Title").ToString());
}
List<string> Results = Elements.FindAll(l => l.Contains(FilterTxt));
return Results.Count == Elements.Count;
}
but i get an empty collection...
Try Cell.Title or Cell.GetProperty("Title"). SearchProperties and FilterProperties are only there for searching for a UI element. They either come from the UIMap or from code if you fill them out with hand. Otherwise your code should should work.
Or you can use a LINQ query (?) like:
var FilteredElements =
from Cell in UIMap...GridTable.GetChildren()
where Cell.GetProperty("Title").ToString().Contains(FilterTxt)
select Cell;
You could also try to record a cell, add it to the UIMap, set its search or filter properties to match your filtering, then call UIMap...Cell.FindMatchingControls() and it should return all matching cells.
The problem now is that you are limiting your search for one row of the table. HtmlControl cells = new HtmlControl(rows); here the constructor parameter sets a search limit container and not the direct parent of the control. It should be the GridTable if you want to search all cells in the table. Best solution would be to use the recorder to get a cell control then modify its search and filter properties in the UIMap to match all cells you are looking for. Tho in my opinion you should stick with a hand coded filtering. Something like:
foreach(var row in GridTable.GetChildren())
{
foreach(var cell in row.GetChildren())
{
//filter cell here
}
}
Check with AccExplorer or the recorder if the hierarchy is right. You should also use debug to be sure if the loops are getting the right controls and see the properties of the cells so you will know if the filter function is right.
I resolved scraping pages html by myself
static public List<string> GetTdTitles(string htmlCode, string TdSearchPattern)
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
HtmlNodeCollection collection = doc.DocumentNode.SelectNodes("//td[#" + TdSearchPattern + "]");
List<string> Results = new List<string>();
foreach (HtmlNode node in collection)
{
Results.Add(node.InnerText);
}
return Results;
}
I'm freakin' hating those stupid coded ui test -.-
btw, thanks for the help
I'm having some difficulties using a lambda expression to parse an html table.
var cells = htmlDoc.DocumentNode
.SelectNodes("//table[#class='data stats']/tbody/tr")
.Select(node => new { playerRank = node.InnerText.Trim()})
.ToList();
foreach (var cell in cells)
{
Console.WriteLine("Rank: " + cell.playerRank);
Console.WriteLine();
}
I'd like to continue to use the syntax as
.Select(node => new { playerRank = node.InnerText.Trim()
but for the other categories of the table such as player name, team, position etc. I'm using Xpath, so I am unsure if its correct.
I'm having an issue finding out how to extract the link + player name from:
Steven Stamkos
The Xpath for it is:
//*[#id="fullPage"]/div[3]/table/tbody/tr[1]/td[2]/a
Can anyone help out?
EDIT* added HTML page.
http://www.nhl.com/ice/playerstats.htm?navid=nav-sts-indiv#
This should get you started:
var result = (from row in doc.DocumentNode.SelectNodes("//table[#class='data stats']/tbody/tr")
select new
{
PlayerName = row.ChildNodes[1].InnerText.Trim(),
Team = row.ChildNodes[2].InnerText.Trim(),
Position = row.ChildNodes[3].InnerText.Trim()
}).ToList();
The ChildNodes property contains all the cells per row. The index with determine which cell you get.
To get the url from the anchor tag contained in the player name cell:
var result = (from row in doc.DocumentNode.SelectNodes("//table[#class='data stats']/tbody/tr")
select new
{
PlayerName = row.ChildNodes[1].InnerText.Trim(),
PlayerUrl = row.ChildNodes[1].ChildNodes[0].Attributes["href"].Value,
Team = row.ChildNodes[2].InnerText.Trim(),
Position = row.ChildNodes[3].InnerText.Trim()
}).ToList();
The Attributes collection is a list of the attributes in an HTML element. We are simply grabbing the value of href.