Retrieve element within html hierarchy - c#

I have this piece of html code. I want to get the text inside the <div> tag using WatiN. The C# code is below, but I'm pretty sure it could be done way better than my solution. Any suggestions?
HTML:
<table id="someId" cellspacing="0" border="1" style="border-collapse:collapse;" rules="all">
<tbody>
<tr>
<th scope="col"> </th>
</tr>
<tr>
<td>
<div>Some text</div>
</td>
</tr>
</tbody>
</table>
C#
// Get the table ElementContainer
IElementContainer diagnosisElementContainer = (IElementContainer)_control.GetElementById("someId");
// Get the tbody element
IElementContainer tbodyElementContainer = (IElementContainer)diagnosisElementContainer.ChildrenWithTag("tbody");
// Get the <tr> children
ElementCollection trElementContainer = tbodyElementContainer.ChildrenWithTag("tr");
// Get the <td> child of the last <tr>
IElementContainer tdElementContainer = (IElementContainer)trElementContainer.ElementAt<Element>(trElementContainer.Count - 1);
// Get the <div> element inside the <td>
Element divElement = tdElementContainer.Divs[0];

Based on the given, something like this is how I'd go for IE.
IE myIE = new IE();
myIE.GoTo("[theurl]");
string theText = myIE.Table("someId").Divs[0].Text;
The above is working on WatiN 2.1, Win7, IE9.

Related

Scrape html located directly below div

I have some html and want to scrape some data from it.
The HTML is structured in the following way
<div class="someClass"><span class="someOtherClass">Text</span></div>
<table>
<tbody>
<tr>
<td>label</td>
<td>data</td>
</tr>
<tr>
<td>label</td>
<td>data</td>
</tr>
<tr>
<td>label</td>
<td>data</td>
</tr>
</tbody>
</table>
<div class="someClass"><span class="someOtherClass">Text</span></div>
<table>
<tbody>
<tr>
<td>label</td>
<td>data</td>
</tr>
<tr>
<td>label</td>
<td>data</td>
</tr>
<tr>
<td>label</td>
<td>data</td>
</tr>
</tbody>
</table>
<div class="someClass"><span class="someOtherClass">Text</span></div>
I need to be able to scrape the Text value located in the span where class="someOtherClass" (I've already implemented this portion)
I then need to be able to scrape the table directly below the div. Since the "parent" div doesn't actually contain the table, I'm having some issues implementing this.
I need to be able to scrape the Text value located in the span
You don't need regex. An Xpath query is enough.
var text = doc.DocumentNode
.SelectNodes("//span[#class='someOtherClass']")
.Select(x => x.InnerText)
.ToList();
I then need to be able to scrape the table directly below the div.
using a similar xpath
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlstring);
var tables = doc.DocumentNode
.SelectNodes("//span[#class='someOtherClass']/following::table").ToList();
foreach (var table in tables)
{
var list = table.Descendants("tr")
.Select(tr => tr.Descendants("td")
.Select(td => td.InnerText).ToList())
.ToList();
}

ASP.Net Get HTML cell value for code behind by clicking

I'm not a professional in C# and ASP.Net so please have some patience with me.
I have the following problem.
I'm using ASP.Net WebForm API with C# for creating a dashboard.
I have a generic HTML table (taken out from a sql query) which will be displayed. Now I want to implement the feature, that when the user clicks on a cell for example in the column ID, he should get an details view which is a bootstrap modal.
For that I need the ID value which is in this cell. How can I get this value?
With the value I will start a new sql query and more other specific informations are going to be shown.
Here is my aspx. structure:
<table id="MyTable" class="table table-striped table-bordered table-condensed table-responsive">
<thead>
<tr>
<th>ID</th>
<th>Name</th>
<th>Typ</th>
<th>Something else</th>
<th>Date</th>
</tr>
</thead>
<tbody>
<%=Tabelle.GetTable.dataTable_all%>
</tbody>
</table>
<script type="text/javascript">
$(document).ready(function () {
$('#MyTable').DataTable();
});
</script>
the variable dataTable_all is a string. So this is my table in HTML Code.
My Result for <tbody> is 366 rows big and here is an extract:
<tr>
<td>154789</td>
<td>Testproject X</td>
<td>Good</td>
<td>greencolored</td>
<td>01.01.2015</td>
</tr>
<tr>
<td>189365</td>
<td>Testproject B</td>
<td>Good</td>
<td>redcolored</td>
<td>08.01.2015</td>
</tr>
<tr>
<td>136471</td>
<td>Testproject Y</td>
<td>Bad</td>
<td>pinkcolored</td>
<td>15.04.2015</td>
</tr>
So how can I do it that when I click on for example ID 136471 that the value will be given to a variable in my c# code?
Change to:
<tr data-id="154789">
<td>154789</td>
<td>Testproject X</td>
<td>Good</td>
<td>greencolored</td>
<td>01.01.2015</td>
</tr>
<tr data-id="189365">
<td>189365</td>
<td>Testproject B</td>
<td>Good</td>
<td>redcolored</td>
<td>08.01.2015</td>
</tr>
<tr data-id="136471">
<td>136471</td>
<td>Testproject Y</td>
<td>Bad</td>
<td>pinkcolored</td>
<td>15.04.2015</td>
</tr>
Then use:
$('tbody tr').click(function() {
alert($(this).data('id'));
});
Working demo
https://jsfiddle.net/jknysneo/

HtmlAgilityPack SelectNodes Syntax

I have the following HTML:
<tbody>
<tr>
<td class="metadata_name">Headquarters</td>
<td class="metadata_content">Princeton New Jersey, United States</td>
</tr>
<tr>
<td class="metadata_name">Industry</td>
<td class="metadata_content"><ul><li>Engineering Software</li><li>Software Development & Design</li><li>Software</li><li>Custom Software & Technical Consulting</li></ul></td>
</tr>
<tr>
<td class="metadata_name">Revenue</td>
<td class="metadata_content">$17.5 Million</td>
</tr>
<tr>
<td class="metadata_name">Employees</td>
<td class="metadata_content">201 to 500</td>
</tr>
<tr>
<td class="metadata_name">Links</td>
<td class="metadata_content"><ul><li>Company website</li></ul></td>
</tr>
</tbody>
I want to be able to load the metadata_content value (ex "$17.5 Million") in to a var where the metadata_name is = to a value (ex: "Revenue").
I have tried to use combinations of code like this for a few hours...
orgHtml.DocumentNode.SelectNodes("//td[#class='metadata_name']")[0].InnerHtml;
But I'm not getting the right combination down. If you have a helpful SelectNodes syntax - that will get me the solution I would appreciate it.
It seems what you're looking for is this:
var found = orgHtml.DocumentNode.SelectSingleNode(
"//tr[td[#class = 'metadata_name'] = 'Revenue']/td[#class = 'metadata_content']");
if (found != null)
{
string html = found.InnerHtml;
// use html
}
Note that to get the text of an element, you should use found.InnerText, not found.InnerHtml, unless you specifically need its HTML content.

Razor get value of input HTML bounded by foreach

I have one following table html, the data of rows in the table is looped by foreach.
Can I use Razor to get the value of each row into a List or an array in C#?
My C# code Razor (I tried this so far)
#{
var l = new List<string>();
l.Add(#<input id="updated_value2" data-bind="value:value,visible:isEditing()" />);
}
Here's my table
<table class="table table-hover">
<tbody data-bind="foreach: $root.mapJsons(parameters())">
<tr class="data-hover">
<td>
<strong>
<span data-bind="text:key" />
</strong>
</td>
<td>
#*display label and input for dictionary<value> false DIS true APP*#
<input id="updated_value" data-bind="value:value,visible:isEditing()" />
<label id="display_value" data-bind="text:value,visible:!isEditing()" />
</td>
</tr>
</tbody>
<thead>
<tr>
<th style="width: 30%">
Name
</th>
<th style="width: 30%">
Value
</th>
<th></th>
</tr>
</thead>
</table>
Your foreach loop is being executed client-side (looks like a KnockoutJS binding?) rather than server-side, so any Razor code you embed in the table is only going to be called once as it's rendered by the server. So the answer is no, you cannot populate a server-side list with this particular foreach loop.

How to get the count of tables in an html file with C# and html-agility-pack

This is a newbie question so please provide working code.
How do I count the tables in an html file using C# and the html-agility-pack?
(I will need to get values from specific tables in an html file based on the count of tables. I will then perform some math on the values retrieved.)
Here is a sample file with three tables for your convenience:
<html>
<head>
<title>Tables</title>
</head>
<body>
<table border="1">
<tr>
<th>Name</th>
<th>Phone</th>
<th>City</th>
<th>Number</th>
</tr>
<tr>
<td>Scott</td>
<td>555-2345</td>
<td>Chicago</td>
<td>42</td>
</tr>
<tr>
<td>Bill</td>
<td>555-1243</td>
<td>Detroit</td>
<td>23</td>
</tr>
<tr>
<td>Ted</td>
<td>555-3567</td>
<td>Columbus</td>
<td>9</td>
</tr>
</table>
<p></p>
<table border="1">
<tr>
<th>Name</th>
<th>Year</th>
</tr>
<tr>
<td>Abraham</td>
<td>1865</td>
</tr>
<tr>
<td>Martin</td>
<td>1968</td>
</tr>
<tr>
<td>John</td>
<td>1963</td>
</tr>
</table>
<p></p>
<table border="1">
<tr>
<th>Animal</th>
<th>Location</th>
<th>Number</th>
</tr>
<tr>
<td>Tiger</td>
<td>Jungle</td>
<td>8</td>
</tr>
<tr>
<td>Hippo</td>
<td>River</td>
<td>4</td>
</tr>
<tr>
<td>Camel</td>
<td>Desert</td>
<td>3</td>
</tr>
</table>
</body>
</html>
If you would, please SHOW how to send the results to a new text file.
Thanks!
I think this can be a starting point
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var tables = doc.DocumentNode.Descendants("table");
int tablesCount = tables.Count();
foreach (var table in tables)
{
var rows = table.Descendants("tr")
.Select(tr => tr.Descendants("td").Select(td => td.InnerText).ToList())
.ToList();
foreach(var row in rows)
Console.WriteLine(String.Join(",", row));
}
Something like this:
HtmlDocument doc = new HtmlDocument();
doc.Load(myTestFile);
// get all TABLE elements recursively
int count = doc.DocumentNode.SelectNodes("//table").Count;
// output to a text file
File.WriteAllText("output.txt", count.ToString());

Categories