I get data from a table but values show not easy to manipulate.
My HTML structure like:
<table>
<tbody>
<tr>
<td>
<span>1</span>
<span>0</span>
<br>
<span>
<span>Good Luck</span>
<img src="/App_Themes/Resources/img/icon_tick.gif" width="3" height="7">
</span>
</td>
</tr>
<tr>
<td>
<b>Nowaday<br></b>
<p>hook<br>zp</p>
</td>
</tr>
</tbody>
</table>
But when I tried to get data will be like:
10Good LuckNowadayhookzp
I was using this code:
ReadOnlyCollection<IWebElement> lstTable = browser.FindElements(By.XPath("table/tbody/tr"));
foreach (IWebElement val in lstTable)
{
ReadOnlyCollection<IWebElement> lstTDElement = val.FindElements(By.XPath("td"));
ReadOnlyCollection<IWebElement> lstSpecialEle =
val.FindElements(By.XPath("//td/span | //td/b | //td/p"));
}
It will create many rows (in a <tr> tag I found about 6000), and I don't know how to arrangement with correct columns.
Because each column, data can be null or have many values in this.
Current, I have lstTDElement contains two columns(in real: 10 columns).
And lstSpecialEle contains all data necessary.
I was filter only get with: [//td/span | //td/b | //td/p].
How to integrated lstSpecialEle to lstTDElement with rights columns. Using foreach with the condition?
Edited:
Typical, I will receive from lstTDElement is: 10Good LuckNowadayhookzp.
lstSpecialEle will create many rows contains all values I need.
The problem is: I don't know how to arrange all rows from lstSpecialEle into a table.
My table has two <tr> tag; this means it has two columns. How to organise all values in lstSpecialEle correct to this columns.
It should be like:
Num Time
1 0 Good Luck Nowaday hookzp
As mentioned, data is dynamic, first <tr> or second <tr> can not have tag like <span>, or don't have tag <b>, etc (it just does not appear, no new <tag> added)
Actually you are finding element from root means using // in your xpath which will search for elements in the whole page while you need to search within the specific row element only so you should try with .// in your xpath which will search for elements only specific element context. So I think you should try as below which will gives you only desire elements list instead of large amount of elements list as below :
ReadOnlyCollection<IWebElement> lstTable = browser.FindElements(By.XPath("//table/tbody/tr"));
foreach (IWebElement val in lstTable)
{
ReadOnlyCollection<IWebElement> lstSpecialEle = val.FindElements(By.XPath(".//td/span | .//td/b | .//td/p"));
}
Edited1 : If your getting list of elements with the combination of null text you can filter it with null condition and get sublists which contains exact text as below :-
var FinalList = lstSpecialEle.Where(x=>x.Text != null).ToList();
Edited2 :- if you want to merge all columns text list into Single list of string try as below :-
List<string> FinalList = new List <string>();
foreach (IWebElement val in lstTable)
{
ReadOnlyCollection<IWebElement> lstSpecialEle = val.FindElements(By.XPath(".//td/span | .//td/b | .//td/p"));
var AllTextList = lstSpecialEle.Where(x=>x.Text != null).ToList().Select(El => El.Text).ToList();
string AllText = String.Join(" ", AllTextList);
FinalList.Add(AllText);
}
Console.WriteLine(FinalList);
Now FinalList will contain all values separated by per row.
Hope it helps...:)
Related
I have an html table in blazor that fills up looping over a list (columns reduced to show relevant code)
#foreach (var line in lines)
{#{ var linenumber = lines.IndexOf(line)}
<td>
<DDItems #bind-Binder="line.item"></DDItems>
</td>
<td>
<button type="button" #onclick="#(() => removeRow(linenumber))">Remove</button>
</td>
}
#Code{
private List<TransactionSALine> lines = new List<TransactionSALine>();
private async Task removeRow(int linenumber)
{
lines = lines.Where(u => lines.IndexOf(u) != linenumber).ToList());
}
}
When I remove one of the rows from the table, item selected in other rows gets changed.
I have tried to use an array instead of the list but that didn't help.
Tried to change the removerow() method using Linq and RemoveAt(). Doesn't change the result.
Before removing the row with Product "pencil":
After removing the product "pencil":
Your snippet does not include the <tr> element. It has to be in there, first level inside the foreach(). You can help the rendering by specifying a key:
#foreach (var line in lines)
{
<tr #key="#line">
...
</tr>
}
I think this takes care of the disappearing Pen item, try it.
On a side note, you can simplify the removal a lot:
#onclick="#(() => lines.Remove(line))">
// private async Task removeRow(int linenumber)
// {
// lines = lines.Where(u => lines.IndexOf(u) != linenumber).ToList());
// }
Let's say, I've this table:
+--------+----------+----
| Device | Serial # |
+--------+----------+----
| Cam1 | AB123 |
+--------+----------+----
Since I don't know in advance the columns that'll be displayed, I construct the table by sending just a pair of key/vale for each cell.
This is how I'm getting my data in C# code.
List<List<KeyValue>> myTable = deviceRepository.GetKeyValues(int facilityId);
Once set to the client side, data in the myTable will be of the following structure:
myTable = [
[ { key: "DeviceName", value: "Device"}, { key: "SerialNumber", value: "Serial #"}, ..],
[ { key: "DeviceName", value: "Cam1"}, { key: "SerialNumber", value: "AB123"}, ..],
...
]
In razor, I'd just have to loop through the list.
#foreach(var row in Model)
{
<tr>
#foreach(var cell in row)
{
<td>#cell.Value</td>
}
</tr>
}
In Angular, I don't see how to do that with directives.
<tr *ngFor="let myInnerList of myTable">
//I'd like to loop through the each inner list to build each table cell
</tr>
Thanks for helping
EDIT
Is it possible to get something like this? i.e if the column is the ID, display a checkbox so that the row can be selected.
#foreach(var cell in row)
{
if(cell.Key == "Id")
{
<td><input type="checkbox" id="row_#cell.Value" /></td>
}
else
{
<td>#cell.Value</td>
}
}
This way, the first cell for every row will display a checkbox.
I am not sure what you are trying to show, you write this but it is dependent on your arrays all being sorted the same within each array. If that is not the case you can either add code to make it so or create a filter.
This is the equivalent of the c# code you have in your question.
<tr *ngFor="let row of myTable">
<td *ngFor="let col of row">
{{col.value}}
</td>
</tr>
I'm parsing site content using AngleSharp and i've got an issue with anonymous block.
See the sample code:
var parser = new HtmlParser();
var document = parser.Parse(#"<body>
<div class='product'>
<a href='#'><img src='img1.jpg' alt=''></a>
Hello, world
<div class='comments-likes'>1</div>
</div>
<div class='product'>
<a href='#'><img src='img2.jpg' alt=''></a>
Yet another helloworld
<div class='comments-likes'>25</div>
</div>
<body>");
var products = document.QuerySelectorAll("div.product");
foreach (var product in products)
{
var productTitle = product.Text();
productTitle.Dump();
}
So, productTitle contains numbers from div.comments-likes, output is:
Hello, world 1
Yet another helloworld 25
I've tried something like product.FirstElementChild.NextElementSibling.Text(); but next sibling for link element is div.comments-likes, not anonymous block. It shows:
1
25
So, anonymous blocks are skipped. :(
The best workaround i've found is deleting all preventing blocks, for my example:
product.QuerySelector(".comments-likes").Remove();
var productTitle = product.Text().Trim();
Is better way for parsing text from anonymous block?
Text is modeled as a TextNode, it is a type of node beside element, comment node, processing instruction, etc. That's why NextElementSibling you tried didn't include the text in the result since it intended to return elements only, as the name suggests.
You can get text nodes located directly within product div by traversing through the div's ChildNodes and then filter by NodeType, for example :
var products = document.QuerySelectorAll("div.product");
foreach (var product in products)
{
var productTitle = product.ChildNodes
.First(o => o.NodeType == AngleSharp.Dom.NodeType.Text
&& o.TextContent.Trim() != "");
Console.WriteLine(productTitle.TextContent.Trim());
}
dotnetfiddle demo
Notice that newlines between elements are also text nodes, so we need to filter those out in the demo above.
I have been trying to scrape some data off a website. The source has differentiated all the headers of tables to that of the actual contents by different class names. Because I want to scrape all the table information, I got all the headers into one array and contents into another array. But the problem is that when I am trying to write the array contents into a file, I can write a header but second array contains contents from all the table and I cannot mark where contents of first table ends.
Because htmlagilitypack scrapes all the tags of specified Nodes, I get all the contents. First let me show the code to make it clear:
<tr class=tableHeader>
<th width=16%>Caught</th>
<th width=16%><p>Normal Range</p></th>
</tr>
<TR class=content><TD><i>Bluegill</i></TD>
<TD>trap net</TD>
<TD align=CENTER>4.05</TD>
<TD align=CENTER> 7.9 - 37.7</TD>
<TD align=CENTER>0.26</TD>
<TD align=CENTER> 0.1 - 0.2</TD>
</TR>
<TR class=content><TD><i></i></TD>
<TD>Gill net</TD>
<TD align=CENTER>1.50</TD>
<TD align=CENTER>N/A</TD>
<TD align=CENTER>0.07</TD>
<TD align=CENTER>N/A</TD>
</TR>
<tr class=tableHeader>
<th>0-5</th>
<th>6-8</th>
<th>9-11</th>
<th>12-14</th>
<th>15-19</th>
<th>20-24</th>
<th>25-29</th>
<th>30+</th>
<th>Total</th>
</tr>
<TR class=content><TD><i>bluegill</i></TD>
<TD align=CENTER>19</TD>
<TD align=CENTER>65</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>84</TD>
</TR>
Below is my code to save the headers and contents into array and try to display it exactly like in the website.
int count =0;
foreach (var trTag4Pale in trTags4Pale)
{
string trText4Pale = trTag4Pale.InnerText;
paleLake[count] = trText4Pale;
if (trTags4Small != null)
{
int counter = 0;
foreach (var trTag4Small in trTags4Small)
{
string trText4Small = trTag4Small.InnerText;
smallText[counter] = trText4Small;
counter++;
}
}
File.AppendAllText(path,paleLake[count]+Environment.Newline+smallText[count]+Environment.Newline);
}
As you see, When I try to append the contents of the array to a file, it lines in the first header, and contents of all the table. But I only want contents of the first table and would repeat the process to get the content of the second table and so forth.
If I could get the contents between tr tag tableHeader, the arrays for the content would contain every contents for all the tables in different arrays. I don't know how to do this.
This might not be the best approach but I made it work somehow. It might be useful resource for somebody someday. So below is the code that worked for me. I append the data stored in the list into an excel sheet. As I have all the data I need for each tr tag with each class, I can manipulate the data I want:
var trTags4Header = document.DocumentNode.SelectNodes("//tr[#class='tableheader']");
if (trTags4Header != null)
{
//Create a list to store td values
List<string> tableList1 = new List<string>();
int row = 2;
foreach (var item in trTags4Header)
{
//Get only next siblings which matches the calss name as "content"
var found = item.SelectNodes("followin-sibling::*").TakeWhile(tag => tag.Name == "tr" && tag.Attributes["class"].Value == "content");
//store the nodes selected in an array (this is the selection of nodes I wanted which has td information I want.
HtmlNode[] nextItem = found.ToArray();
foreach (var node in nextItem)
{
//Gets individual td values within tr class='content' Notice .//td- this starts looking from the present node instead of the root nodes.
var tdValues = node.SelectNodes(".//td").TakeWhile(tdTag => tdTag.Name == "td");
int column = 1;
//Stores each td values into the list which is why I have control over the data to where I want to store, I am storing them in one excel worksheet.
foreach (var tdText in tdValues)
{
tableList1.Add(tdText.InnerText);
ws1.Cells[row, column] = tdText.InnerText;
column++;
}
row++;
}
}
//Display the content in a listbox
listBox1.DataSource = tableList1;
}
Please suggest a better solution if you come across this or leave your feedback. Thanks
From this XHTML source:
<div class = "page">
<h1>UNIQUE NAME</h1>
<table>
<tbody>
<tr>
<td>DATA TO EXTRACT 1</td>
</tr>
<tr>
<td />
<td />
<td />
<td />
<td />
<td>DATA TO EXTRACT 2</td>
</tr>
</tbody>
</table>
etc...
There are multiple instances of UNIQUE NAME with a similar set of child elements.
I need to locate the UNIQUE NAME element and extract all values (DATA TO EXTRACT) within each of the child element tags. In addition, I need to keep a count of where each value is located. For example DATA TO EXTRACT 1 would be at tr 1, td 1. DATA TO EXTRACT 2 would be at tr 2, td 6.
I am new to linq to xml and I was wondering whether someone could point me in the right direction with regards to a strategy. I have managed to figure out how to get to the UNIQUE name element with the following code:
var choice1 = (from category in _data.Descendants("div")
where category.Element("h1").Value == "UNIQUE NAME"
select category).DescendantNodes();
This returns a set of the values, which I'm sure I could loop through but I'm sure there must be a more elegant way of achieving this goal.
Many thanks!
Here’s one way of doing it using LINQ:
var choice1 =
from category in _data.Descendants("div")
where category.Element("h1").Value == "UNIQUE NAME"
from row in category.Descendants("tr").Select((element, index) => new { element, index })
from col in row.element.Elements("td").Select((element, index) => new { element, index })
where !string.IsNullOrEmpty(col.element.Value)
select new
{
RowIndex = row.index + 1, // one-based index
ColIndex = col.index + 1,
Value = col.element.Value,
};
An example of how to use your results:
foreach (var v in choice1)
Console.WriteLine(string.Format(
"RowIndex = {0}, ColIndex = {1}, Value = \"{2}\".",
v.RowIndex, v.ColIndex, v.Value));
…which would output:
RowIndex = 1, ColIndex = 1, Value = "DATA TO EXTRACT 1".
RowIndex = 2, ColIndex = 6, Value = "DATA TO EXTRACT 2".