So I'm aware of how to select a node using htmlagilitypack:
HtmlNode.SelectNodes(".//div[#class='description']")
etc... but say I have a site set up in the following way:
This is Link 1
This is information i want to get to
This is Link 3
This is information i want to get to
This is Link 5
This is Link 6
etc...
Now, the snippet is short, but basically, The links are asymmetric, and I only want to access links that have the text value
"this is information i want to get to"
(I'm not familiar enough with hmtl to use proper terminology here, sorry). Is there a method in htmlagilitypack where I can check this text value?
Thank you!
Try using the text() function:
SelectNodes("a[text()='This is information i want to get to']")
Related
Hopefully you can help me as I do not find a solution neither on the web nor in my brain.
I am querying a issue-tracking-system (jira) via a webrequest. The systems answer is a json-file with a description of an issue represented by a string that has wiki-markdowns in it. It is possible to show this string 1:1 to the user. But I would prefer a solution to somehow parse the string and show the user not the textual markdown but the parsed elements like tables or numbered enumerations.
I use C# and currently I am showing the information in a richtextbox, but I guess richtextbox is not the element you choose for such a requirement.
For Example the following string is returned by the jira-system and I would like it to be shown as a "real" table and an enumeration to the user.
||criteria||status||
|concept 1|open|
|concept 2|open|
* topic 1
* topic 2
Hope you can help me
after long researches the answer is totally simple.
The Jira offers a conversion from markdown to html itself. When you query an issue via a URL just add ?expand=renderedFields to the URL like explained here https://community.atlassian.com/t5/Answers-Developer-Questions/How-can-I-get-the-rendered-HTML-of-a-wiki-markup-field-in-JIRA/qaq-p/495779
You will receive the answer like before and additional to that the html-writing of the answer. With that answer it is almost simple to show it in an webbrowser-element in the UI
Total XPath noob here and it doesn't help that I have only a basic grounding in HTML/XML (Infrastructure support is my domain). Please could you help me determine a good XPath for the highlighted value (2nd Line Engineer). I managed to it for "description", extracting the text value underneath by using:
//div[#class='description'
but am unable to do so for the mentioned one. Also how does one target the below node in a statement?
"li class="position" data-section="currentPositionDetails"
Some possible solutions:
//li[#data-section='currentPositionsDetails']
//li[#data-section='currentPositionsDetails']//*[#class='item-title']//text()
//li[#data-section='currentPositionsDetails']//*[#class='item-title']//span/text()
Hi I'm kind of new to selenium, so please bear with me if the question is too basic.
I wanna access a date picker element and choose a specific date.
I am trying to access the span element using both class and text inside.
I get an error of invalid string. Is the syntax below correct?
_driver.FindElement(By.CssSelector("span[class='xxx'][contains(text(),'xx')]"))
Looking at this cheatsheet
https://www.simple-talk.com/dotnet/.net-framework/xpath,-css,-dom-and-selenium-the-rosetta-stone/
I think it might be more like
_driver.FindElement(By.CssSelector("span.CCC:contains('TTT')"));
where CCC is your class name and TTT is the text your looking for.
I'm on a development process of a crawling engine. My program crawls websites through Xpath with HtmlAgilityPack. I need to get some image src tag's directly. You can see my simple code below which is not working correctly, thanks in advice!
PS: Please ignore " char problem, XPath patterns are provided by database.
Agility.DocumentNode.SelectSingleNode("//img[#id="product_photo"]/#src");
And this is the line i need to crawl (the *...* part shows block to extract
<img id="product_photo" src="*/images/thumb/4400/10280/st.jpg*">
Some pages provide image in meta tags so .Attributes["src"] wont work.
UPDATE: You can see my query and result here
You cann't get the value of "src" or any other attributes in using:
Agility.DocumentNode.SelectSingleNode(yourXpath);
Just by using:
string s=Agility.DocumentNode.SelectSingleNode(yourXpath).value;
It's because XPath cann't return value of an attribute by SelectSingleNode() func in HtmlAgilityPack class. So you must use SelectSingleNode(yourXpath).value or use Regex after the pharsing to get just the "src" without the outerText.
I have an HTML document, and I am willing to find out the xpath to an element containing a certain string.
To elaborate a bit more:
My HTML document is created dynamically and I have no specific names for s. The divs I am interested at look like (more or less):
<div>Country: China</div>
<div>Type: Earphones</div>
I want to get the whole string "Country: China". In order to do so, I want to find the xpath to this div by searching for "Country:" in the HTML.
I hope I was specific enough... Thank you!
Here are a couple ways:
//div[contains(child::text(), "Country:")]
//div/child::text()[contains(., "Country:")]/parent::node()
If you want to try things out within a browser, try out in-browser XPath bookmarklet.