Total XPath noob here and it doesn't help that I have only a basic grounding in HTML/XML (Infrastructure support is my domain). Please could you help me determine a good XPath for the highlighted value (2nd Line Engineer). I managed to it for "description", extracting the text value underneath by using:
//div[#class='description'
but am unable to do so for the mentioned one. Also how does one target the below node in a statement?
"li class="position" data-section="currentPositionDetails"
Some possible solutions:
//li[#data-section='currentPositionsDetails']
//li[#data-section='currentPositionsDetails']//*[#class='item-title']//text()
//li[#data-section='currentPositionsDetails']//*[#class='item-title']//span/text()
Related
I'm trying to fix my site to meet WCAG 2.0. This means that all links in my site must have a title. To do it right and not miss any <a> tags, I'm ading title to each link as a first attribute :
<a title="..."
But this site has a lot of links and I'm struggling to find all the links without a title. Can anyone help me with a regular expression that I could use to find all tags that start with <a but the next letter isn't 't'?
If someone has an answer on how to find specific tag without specific attribute it will be even better! I'm working on visual studio 2015
Generally, it is not a good idea to use regular expressions on a complex system like a DOM. However, in your (simple) example, you might get along with:
<a(?:(?!\btitle\b)[^>])*>
This ignores links with a title attribute, regardless where they are. See a demo on regex101.com.
Remember that it will fail on e.g. This one fails not matching it in the way you intented.
How about this one?
(<a )(?!title)
Matches:
<a >
But not:
<a title="..."
Try it here.
This is the xpath text i tried to use along with HtmlAgilityPack C# parser.
//div[#id = 'sc1']/table/tbody/tr/td/span[#class='blacktxt']
I tried to evaluate the xpath expression with firefox xpath add=on and sucessfully got the required items. But the c# code returns an Null exception.
HtmlAgilityPack.HtmlNodeCollection node = htmldoc.DocumentNode.SelectNodes("//div[#id ='sc1']/table/tbody/tr/td/span[#class='blacktxt']");
MessageBox.Show(node.ToString());
the node always contains null value...
Please help me to find the way to get around this problem...
Thank you..
DOM Requires <tbody/> Tags to be Inserted
All common browser extensions for building XPath expressions work on the DOM. Opposite to the HTML specs, the DOM specs require <tr/> elements to be inside <tbody/> elements, so browsers add such elements if missing. You can easily see the difference if looking at the HTML source using Firebug (or similar developer tools working on the DOM) versus displaying the page source (using wget or similar tools that do not interpret anything if necessary).
The Solution
Remove the /tbody axis step, and your XPath expression will probably work.
//div[#id = 'sc1']/table/tr/td/span[#class='blacktxt']
If you Need to Support Both HTML With and Without <tbody/> Tags
For a more general solution, you could replace the /tbody axis step by a decendant-or-self step //, but this could jump into "inner tables":
//div[#id = 'sc1']/table//tr/td/span[#class='blacktxt']
Better would be to use alternative XPath expressions:
//div[#id = 'sc1']/table/tr/td/span[#class='blacktxt'] | //div[#id = 'sc1']/table/tbody/tr/td/span[#class='blacktxt']
A cleaner XPath 2.0 only solution would be
//div[#id = 'sc1']/table/(tbody, self::*)/tr/td/span[#class='blacktxt']
So I'm aware of how to select a node using htmlagilitypack:
HtmlNode.SelectNodes(".//div[#class='description']")
etc... but say I have a site set up in the following way:
This is Link 1
This is information i want to get to
This is Link 3
This is information i want to get to
This is Link 5
This is Link 6
etc...
Now, the snippet is short, but basically, The links are asymmetric, and I only want to access links that have the text value
"this is information i want to get to"
(I'm not familiar enough with hmtl to use proper terminology here, sorry). Is there a method in htmlagilitypack where I can check this text value?
Thank you!
Try using the text() function:
SelectNodes("a[text()='This is information i want to get to']")
I have an HTML document, and I am willing to find out the xpath to an element containing a certain string.
To elaborate a bit more:
My HTML document is created dynamically and I have no specific names for s. The divs I am interested at look like (more or less):
<div>Country: China</div>
<div>Type: Earphones</div>
I want to get the whole string "Country: China". In order to do so, I want to find the xpath to this div by searching for "Country:" in the HTML.
I hope I was specific enough... Thank you!
Here are a couple ways:
//div[contains(child::text(), "Country:")]
//div/child::text()[contains(., "Country:")]/parent::node()
If you want to try things out within a browser, try out in-browser XPath bookmarklet.
i have just started learning linq because i like the sound of it. and so far i think im doing okay at it.
i was wondering if Linq could be used to find the following information in a file, like a group at a time or something:
Control
Text
Location
Color
Font
Control Size
example:
Label
"this is text that will
appear on a label control at runtime"
23, 77
-93006781
Tahoma, 9.0, Bold
240, 75
The above info will be in a plain file and wil have more than one type of control and many different sizes, font properties etc associated with each control listed. is it possible in Linq to parse the info in this txt file and then convert it to an actual control?
i've done this using a regex but regex is too much of a hassle to update/maintain.
thanks heaps
jase
Edit:
Since XML is for structured data, would Linq To XML be appropriate for this task? And would you please share with me any helpful/useful links that you may have? (Other than MSDN, because I am looking at that now. :))
Thank you all
If you are generating this data yourself, then I HIGHLY recommend you store this in an XML file. Then you can use XElement to parse this.
EDIT: This is exactly the type of thing that XML is designed for, structured data.
EDIT EDIT: In response to the second question, Linq to XML is exactly what your looking for:
For an example, here is a couple of links to code I have written that parses XML using XElements. It also creates a XML document.
Example 1 - Loading and Saving: have a look under the FromXML() and ToXML() methods.
Example 2 - Parsing a large XML doc: have a look under the ParseXml method.
Hope these get you going :D
LINQ is good for filtering off rows, selecting relevant columns etc.
Even if you use LINQ for this, you will still need regex to select the relevant text and do the parsing.