I am using HTMLAgilityPack to read and load an XML file. After the file is loaded, I want to insert the values from it into a database.
XML looks like this:
<meeting>
<jobname></jobname>
<jobexperience></jobexperience>
</meeting>
I'm trying to accomplish this using XPath statements within a foreach loop as seen here:
DataTable dt = new DataTable();
//Add Data Columns here
dt.Columns.Add("JobName");
dt.Columns.Add("JobExperience");
// Create a string to read the XML tag "job"
string xPath_job = "//job";
string xPath_job_experience = "//jobexperience";
/* Use a ForEach loop to go through all 'meeting' tags and get the values
from the 'JobName' and 'JobExperience' tags */
foreach (HtmlNode planned_meeting in doc.DocumentNode.SelectNodes("//meeting"))
{
DataRow dr = dt.NewRow();
dr["JobName"] = planned_meeting.SelectSingleNode(xPath_job).InnerText;
dr["JobName"] = planned_meeting.SelectSingleNode(xPath_job_experience).InnerText;
dt.Rows.Add(dr);
}
So the problem is that even though the foreach loop is going through every 'meeting' tag, it's getting the values from only the first 'meeting' tag.
Any help would be greatly appreciated!
So the problem is that even though the foreach loop is going through every 'meeting' tag, it's getting the values from only the first 'meeting' tag.
Yes, that's what the code does. The XPath operator // selects all the elements in the whole document, e.g. //job select all job elements in the whole document.
So in your foreach loop you select all meeting elements in the whole document with
doc.DocumentNode.SelectNodes("//meeting"))
and then - in the loop - you select all //job and all //jobexperience elements in the whole document with
string xPath_job = "//job";
string xPath_job_experience = "//jobexperience";
So you select the first element of all elements - over and over again... Hence the impression that you only get the first element.
So change the code in a way that the children of the current meeting element get selected (by removing the // operator):
string xPath_job = "job";
string xPath_job_experience = "jobexperience";
Related
I'm trying to parse the main (last in the dom tree)
<table>
in this website: "https://aips.um.si/PredmetiBP5/Main.asp?Mode=prg&Zavod=77&Jezik=&Nac=1&Nivo=P&Prg=1571&Let=1"
Im using the Htmlagilitypack and writing code in C# on a wpf application in visual studio 17.
Right now im using this code:
iso = Encoding.GetEncoding("windows-1250");
web = new HtmlWeb()
{
AutoDetectEncoding = false,
OverrideEncoding = iso,
};
//http = https://aips.um.si/PredmetiBP5/Main.asp?Mode=prg&Zavod=77&Jezik=&Nac=1&Nivo=P&Prg=1571&Let=1
string http = formatLetnikLink(l.Attributes["onclick"].Value).ToString();
var htmlProgDoc = web.Load(http);
string s = htmlProgDoc.ParsedText;
htmlprogDoc.ParsedText correctly includes all the rows
that are supposed to be in the last table
(I had this for debugging, just incase the watch window was broken or something... idk...)
I tried to first get all the tables on the tables on the website. And realized that there are 6
<table></table>
tags on it, even tho you visualy see only one. After debuggign for a couple of hours, i realized that the last main table, is the last
<table>
in the dom tree, and that the parser parsing fully all the
<tr>
tags that the table has. This is the problem, I need all the tr tags.
var tables = htmlProgDoc.DocumentNode.SelectNodes("//table");
There are 6 times
<table></table>
tags, as expected, and everyone of them is fully parsed, including all their rows and columns, except the last one, in the last one it only parses the first two rows and then the parser apears to append a
</table>
by its self, I also tried using the direct xpath selector, copy-ed from firefox:
"/html/body/div/center[2]/font/font/font/table", instead of "//table"
which found the correct table, but the table also contained only the first 2 rows
var theTableINeed = tables.Last();
//contains the correct table which I need, but with only the first two rows
The Html on that page is malformed. One possible workaround is stripping the code for last table and parse it as a document.
var client = new WebClient();
string html = client.DownloadString(url);
int lastTableOpen = html.LastIndexOf("<table");
int lastTableClose = html.LastIndexOf("</table");
string lastTable = html.Substring(lastTableOpen, lastTableClose - lastTableOpen + 8);
Then use HtmlAgilityPack:
var table = new HtmlDocument();
table.LoadHtml(lastTable);
foreach (var row in table.DocumentNode.SelectNodes("//table//tr"))
{
Console.WriteLine(row.ToString());
}
But I don't know if there are problems in the table itself.
This is an extract of the XML i'm using
<AccountNumber>
<RecordID>A9</RecordID>
<SegmentLength>14</SegmentLength>
<Number>6770307103</Number>
</AccountNumber>
<PaymentCounter>
<RecordID>B2</RecordID>
<SegmentLength>14</SegmentLength>
<History>99</History>
<Delinq30Day>00</Delinq30Day>
<Delinq60Day>00</Delinq60Day>
<Delinq90PlusDay>00</Delinq90PlusDay>
<DerogCnt>00</DerogCnt>
</PaymentCounter>
Here I'm trying to identify the number of child elements under AccountNumber and PaymentCounter.
I'm looking for this answer
AccountNumber count = 3
PaymentCounter count = 7
After I get the count, I will use the data to dynamically populate a table.
I tried this :
int count1 = xmlDocument.Descendants("AccountNumber").Count();
The count() in LINQ to XML will give me the number of times element "AccountNumber" is repeated in the XML, but doesnot give me the number of child elements.
Is there a way to do this?
Descendants("AccountNumber") returns the children, grandchildren, etc. with the name AccountNumber. You actually want the children of the AccountNumber element.
See MSDN
In your sample there is exactly one AccountNumber, assuming this XML fragment is true of the actual document you are processing and the AccountNumber element is a child of the root element you probably want to do this:
xmlDocument.Element("AccountNumber").Elements().Count()
Yes, this is possible by using xmlDocument.Descendants("AccountNumber").First().Decendants().Count().
From MSDN
XDocument.Descendants
Returns a collection of the descendant elements for this document or element, in document order
if you so something like the following
var elements = xmlDocument.Descendants("AccountNumber").
you will get all the AccountNumber elements
you have to iterate through each element in order to get the count of each element
foreach (var xElement in elements )
{
var acElt= xElement.Descendants().Count();
//then do your construction here
}
I have a bunch of divs. I am looping thru them using their class name: col-lg-3. In each of the divs by the class name above, I have xpaths like the ones below. I need to pick out the string value at td[4]
.//*[#id='item-186951']/div[1]/table/tbody/tr[6]/td[4]
.//*[#id='itemPNS18-152951']/div[1]/table/tbody/tr[6]/td[4]
.//*[#id='itemXYZ-8152951']/div[1]/table/tbody/tr[6]/td[4]
.//*[#id='item11641551']/div[1]/table/tbody/tr[6]/td[4]
.//*[#id='itemAPS12641']/div[1]/table/tbody/tr[6]/td[4]
This part is the one that is changing in every element [#id='{dynamically-changing-id}'], with the rest being constant. How do I loop over this? At the moment I am trying this but I only get the first item (understandably)
//get all divs on page
var elements = ...FindElements(By.ClassName("col-lg-3"));
//for each div found
foreach(var e in elements)
{
//get text at xpath
string str = e.FindElement(By.XPath(".//*[#id='item-186951']/div[1]/table/tbody/tr[6]/td[4]"));
}
Im using Selenium 2.53
You can check that id attribute starts with "item":
var elements = ThisPage.FindElements(By.XPath(".//*[starts-with(#id, 'item')]/div[1]/table/tbody/tr[6]/td[4]"))
From the above image the new data that I add gets added to the last page but there can be similar name and I need to verify the data using the ID as shown. So I am trying to figure out the way to store text values from the id and when the new data is added it should verify the last newly added ID. Any ideas?
int i = 1;
bool found = false;
string ID;
try
{
IWebElement LastPage = Driver.driver.FindElement(By.XPath("html/body/div[4]/div/div/div[2]/table/tbody/tr[8]/td[1]"));
LastPage.Click();
for (i = 1; i <= 10; i++) ;
{
ID= Driver.driver.FindElement(By.XPath("html/body/div[4]/div/div/div[2]/table/tbody/tr[" + i + "]/td[1]")).Text;
if (??)
}
As #viet-pham said, using last is a good idea, you can also use a relative xpath like:
//table/tbody/tr[last()-1]/td
and you don't need to use [1] since you are using FindElement and not FindElements and is returning the first element found.
You don't need to use for loop to get the last item, just put last() to select the final tr element
In your case:
IWebElement lastElement = Driver.driver.FindElement(By.XPath("(html/body/div[4]/div/div/div[2]/table/tbody/tr)[last()]/td[1]"));
ID = lastElement.Text;
In additional: because the last row is the total, so you must change to the row before it => [last()-1]
I need to check a filter function on a table.
This filter is only on the first cell of each row and I'm trying to figure out how to get all those values...
I tried with something like
public bool CheckSearchResults(HtmlControl GridTable, string FilterTxt)
{
List<string> Elements = new List<string>();
foreach (HtmlCell cell in GridTable.GetChildren())
{
Elements.Add(cell.FilterProperties["title"]);
}
List<string> Results = Elements.FindAll(l => l.Contains(FilterTxt));
return Results.Count == Elements.Count;
}
but I get stuck at the foreach loop...
maybe there's a simply way with linq, but i don't know it so much
edit:
all the cells i need have the same custom html tag.
with this code i should get them all, but i don't know how to iterate
HtmlDocument Document = this.UIPageWindow.UIPageDocument;
HtmlControl GridTable = this.UIPageWindow.UIPageDocument.UIPageGridTable;
HtmlCell Cells = new HtmlCell(GridTable);
Cells.FilterProperties["custom_control"] = "firstCellOfRow";
also because there's no GetEnumerator function or query models for HtmlCell objects, which are part of Microsoft.VisualStudio.TestTools.UITesting.HtmlControl library -.-
edit2:
i found this article and i tried this
public bool CheckSearchResults(string FilterTxt)
{
HtmlDocument Document = this.UIPageWindow.UIPageDocument;
HtmlControl GridTable = this.UIPageWindow.UIPageDocument.UIPageGridTable;
HtmlRow rows = new HtmlRow(GridTable);
rows.SearchProperties[HtmlRow.PropertyNames.Class] = "ui-widget-content jqgrow ui-row-ltr";
HtmlControl cells = new HtmlControl(rows);
cells.SearchProperties["custom_control"] = "firstCellOfRow";
UITestControlCollection collection = cells.FindMatchingControls();
List<string> Elements = new List<string>();
foreach (UITestControl elem in collection)
{
HtmlCell cell = (HtmlCell)elem;
Elements.Add(cell.GetProperty("Title").ToString());
}
List<string> Results = Elements.FindAll(l => l.Contains(FilterTxt));
return Results.Count == Elements.Count;
}
but i get an empty collection...
Try Cell.Title or Cell.GetProperty("Title"). SearchProperties and FilterProperties are only there for searching for a UI element. They either come from the UIMap or from code if you fill them out with hand. Otherwise your code should should work.
Or you can use a LINQ query (?) like:
var FilteredElements =
from Cell in UIMap...GridTable.GetChildren()
where Cell.GetProperty("Title").ToString().Contains(FilterTxt)
select Cell;
You could also try to record a cell, add it to the UIMap, set its search or filter properties to match your filtering, then call UIMap...Cell.FindMatchingControls() and it should return all matching cells.
The problem now is that you are limiting your search for one row of the table. HtmlControl cells = new HtmlControl(rows); here the constructor parameter sets a search limit container and not the direct parent of the control. It should be the GridTable if you want to search all cells in the table. Best solution would be to use the recorder to get a cell control then modify its search and filter properties in the UIMap to match all cells you are looking for. Tho in my opinion you should stick with a hand coded filtering. Something like:
foreach(var row in GridTable.GetChildren())
{
foreach(var cell in row.GetChildren())
{
//filter cell here
}
}
Check with AccExplorer or the recorder if the hierarchy is right. You should also use debug to be sure if the loops are getting the right controls and see the properties of the cells so you will know if the filter function is right.
I resolved scraping pages html by myself
static public List<string> GetTdTitles(string htmlCode, string TdSearchPattern)
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
HtmlNodeCollection collection = doc.DocumentNode.SelectNodes("//td[#" + TdSearchPattern + "]");
List<string> Results = new List<string>();
foreach (HtmlNode node in collection)
{
Results.Add(node.InnerText);
}
return Results;
}
I'm freakin' hating those stupid coded ui test -.-
btw, thanks for the help