Order HtmlNodes Based on their position on the HTML Page (C# / XPath) - c#

Context:
I am parsing the result of a Query on this service, but the HTML with the result is a mess.
My goal is the build a "KeyValue" pair with each "attribute and value" shown as result of this query.
At the moment only one way came into my mind to solve it.
Logic for Parsing:
Select all the Attribute nodes
Select all the value nodes
Match their "indexes" on each collection built to build the Key Value Pairs
E.g: Attribute[0] with Value[0] -> (In this service, that would be "CNPJ" and "12.272.084/0001-00").
Problem:
Even tho i managed to find a XPath expression to fetch all the attributes nodes:
attrNodes = htmlDoc.DocumentNode.SelectNodes ("//td[#bgcolor='#f1f1b1']/*/font[#face='Verdana']");
I could not manage to find one for the value nodes aswell, since there are different types of nodes that actually look the same when rendered by Html ( "b" and "strong" for example).
There are even nodes with different hierarquies that prevented me from using Wildcards ("*") on XPath to solve it (single tag or two tags nested for example)
My Goal:
Write XPaths to reach each different subset of nodes with values
Put all the nodes in a single Collection
Order the nodes of this Collection based on the position of each node in the Html (nodes that appear first on the HTML will be on the begining of the list)
Any idea of how can i achieve my goal ?
HTML Sample:
You can either give it a check here
or Query yourself the service by typing : 12272084000100 on the CNPJ textbox
and clicking on "Pesquisar". After that, you just have to click on the text "Companhia Eletrica de Alagoas"
Thanks in Advance

I just found an Attribute that can be found on the "HtmlNode" class of the HtmlAgilityPack Framework that managed to solve my problem.
According to this documentation about the HtmlNode Class:
StreamPosition
Gets the stream position of this node in the document, relative to the start of the document.
Here is the output of my tests using a list of tables found in this very same Html Page (tables used for testing purposes)
// HtmlNodeCollection of Tables
tableNodes[0].StreamPosition
925
tableNodes[1].StreamPosition
1651
tableNodes[2].StreamPosition
2387
Ordering my list using this StreamPosition as parameter managed to solve my problem.
List<HtmlNode> OrderedList = valueNodes.OrderBy ( node => node.StreamPosition ).ToList<HtmlNode>();

Related

C# Razor query - Using Umbraco V7, how can I check if current page exists as child item in any level 1 content tree nodes?

I have four home nodes for various cultures and host names which are all level 1 in my content tree.
I'm looking to check if the current page exists as a child item in either of those nodes and if so display links, specifically an hreflang tag. (not all child pages exist on every node)
What I've tried so far is creating foreach loops to loop through and it's child items, however so far it only returns me the children for level 1.
IPublishedContent homeOne = Umbraco.TypedContent(1172);
IPublishedContent homeTwo = Umbraco.TypedContent(6093);
IPublishedContent homeThree = Umbraco.TypedContent(7886);
IPublishedContent homeFour = Umbraco.TypedContent(9679);
var pageUrl = CurrentPage.Url().ToString();
foreach (var item in homeOne.Children)
{
var pagePath = item.Url.ToString();
if (pagePath == pageUrl)
{
<link rel="alternate" href="#string.Format("https://www.domain.xx{0}", pagePath)" hreflang="x-default" />
<link rel="alternate" href="#string.Format("https://www.domain.xx{0}", pagePath)" hreflang="en-gb" />
}
}
I then have three similar foreach loops beneath the example above for the other home nodes, interestingly/annoyingly...if I'm on a child page of homeOne, I only see the links generated for homeOne, however if I go to a child page of homeTwo I see the links generated for homeOne and homeTwo. Subsequently visiting homeThree returns links for homeOne and homeThree only.
edit
I should re-iterate that these four root nodes have already set up with cultures and host names and are all in the same language. I'm basically just trying to look at what the current page is, if it exists as a child item across either or all of the four root nodes then show the relevant links
Current Tree Example-
HomeOne
Product1
Product2
HomeTwo
Product1
HomeThree
Product1
Product2
Hope someone can point me in the right direction
Thanks
CurrentPage should have a Path property which contains the IDs of every ancestor. I believe it's a string that you can split by ',' and then see if the resulting array contains either of the homeNode IDs. Something like this: https://coderwall.com/p/gl7poa/active-navigation-in-umbraco
From your code example I can't really figure out what you're trying to do, though. If you are trying to run a classic multilingual setup, have a look at this tutorial - setting it up like this (with domains/hostnames per language) lets Umbraco take care of a lot of the hassle: https://our.umbraco.com/Documentation/Tutorials/Multilanguage-Setup/index-v7
The final solution I found for this was provided via an Umbraco community member, I made some tweaks to the final solution which are documented at the foot of the post below
https://our.umbraco.com/forum/using-umbraco-and-getting-started/100089-query-children-of-multiple-root-nodes-and-return-values-for-hreflang-tags

How to get a second or third XML node when using an anonymous type?

I'm using an anonymous type to grab some XML data. All was going well until I ran across a section of XML where there can be 2 or 3 similar nodes. Like in the XML sample below there are 3 separate "Phones". My code was working fine when there was only ONE element that was possible to grab after following the "element path" I led it to. How can i grab a specific one? Or all 3 for that matter? Handling XML is still new to me and there seems to be soo many ways of handling it Searching the web for my exact need here didn't prove successful. Thanks.
var nodes = from node in doc.Elements("ClaimsSvcRs").Elements("ClaimDownloadRs")
select new
{
Phone1 = (string)node.Elements("Communications").Elements("PhoneInfo").Elements("PhoneNumber").FirstOrDefault(),
Phone2 = (string)node.Elements("Communications").Elements("PhoneInfo").Elements("PhoneNumber").FirstOrDefault(),
};
The XML Code is
<?xml version="1.0" encoding="UTF-8"?>
<TEST>
<ClaimsSvcRs>
<ClaimDownloadRs>
<Communications>
<PhoneInfo>
<PhoneTypeCd>Phone</PhoneTypeCd>
<CommunicationUseCd>Home</CommunicationUseCd>
<PhoneNumber>+1-715-5553944</PhoneNumber>
</PhoneInfo>
<PhoneInfo>
<PhoneTypeCd>Phone</PhoneTypeCd>
<CommunicationUseCd>Business</CommunicationUseCd>
<PhoneNumber>+1-715-5552519</PhoneNumber>
</PhoneInfo>
<PhoneInfo>
<PhoneTypeCd>Phone</PhoneTypeCd>
<CommunicationUseCd>Cell</CommunicationUseCd>
<PhoneNumber>+1-715-5551212</PhoneNumber>
</PhoneInfo>
</Communications>
</ClaimDownloadRs>
</ClaimsSvcRs>
</TEST>
I haven't used xpath in a while so i'll let someone else stand in there... but there's a way to select a particular PhoneInfo object based upon its subelements. So if you knew whether you wanted Home or Business or Cell or whatever, you'd be able to select that particular PhoneInfo object. Otherwise if you wanted simple Phone1,2,3 and nulls where ok, use the Skip linq function. Phone2 = query.Skip(1).FirstOrDefault()
lol no worries ;) xpath can be intermixed in here, was my thought, and might be more elegant if your CommunicationUseCd fields were deterministic. Then you could have Home = ... and Work = ..., etc, instead of Phone1 & Phone2
The same could be accomplished by slipping a where clause into each your query lines
If you're up for LINQ you can get all your elements in one go:
foreach(XElement phone in XDocument.Parse(xmlString).Descendants("PhoneInfo"))
{
Console.WriteLine(phone.Element("PhoneNumber").Value);
//etc
}
I find XDocument & LINQ a lot easier than XmlDocument & XPath, if you're okay with the alternative. There's more info on them here

Check all the children for XElement

I have XElement object which is my XML tree read from XML file. Now I want to check all the nodes in this tree to get first attribute name and value. Is there any simple way to go through all of the nodes (from root till leaves)? My XML file has got very many different and strange nodes - that's why it's harder to solve this issue. I thought about writing some recursion, but hope it's another way to solve that easier.
Maybe take a look to Xpath. an XPath like this //*[#id=42] could do the job.
It means get all nodes which have an attribute "id" of value 42.
You can do just //* which gonna returns all nodes in a tree.
Xpath :
http://msdn.microsoft.com/en-gb/library/ms950786.aspx
Syntax :
http://msdn.microsoft.com/en-us/library/ms256471.aspx
You can get all children elements using XElement.Elements().
Here's some code using recursion to get all elements of each level:
void GetElements(XElement element){
var elements = element.Elements();
foreach(Element e in elements){
//some stuff here
if(e.Elements() != null)
GetElements(e);
}
}

Select all links from a Html table using XPath (and HtmlAgilityPack)

What I am trying to achieve is to extract all links with a href attribute that starts with http://, https:// or /. These links lie within a table (tbody > tr > td etc) with a certain class. I thought I could specify just the the a element without the whole path to it but it does not seem to work. I get a NullReferenceException at the line that selects the links:
var table = doc.DocumentNode.SelectSingleNode("//table[#class='containerTable']");
if (table != null)
{
foreach (HtmlNode item in table.SelectNodes("a[starts-with(#href, 'https://')]"))
{
//not working
I don't know about any recommendations or best practices when it comes to XPath. Do I create overhead when I query the document two times?
Use:
//tbody/descendant::a[starts-with(#href,'https://')
or
starts-with(#href,'http://')
or
starts-with(#href,'./')
]
You will still have a problem, unless you correct your code to reflect the fact that the the XmlNode.SelectNodes() instance method has a return type of XmlNodeList, not HtmlNode.
The problems is that you are selecting the table and then immediately trying to select the anchors as if they were direct decedents. There are tr and td tags in the middle.
So, if you change your xpath to the following, things should work:
"tbody/tr/td/a[starts-with(#href, 'https://')]"
This will not work if your anchors are wrapped up in something else, so you could select all of the anchors in the current node set (i.e. table):
"//a[starts-with(#href, 'https://')]"
See this for more detail on xpath syntax.

XML ancestors wanted

In C#, I need to get
currentnode.parentnode.parentnode.parentnode.firstchild.lastchild.lastchild
I am using to generating MLM tree some of the label which represent individual node overload at the fourth level so I was trying to get that nodes and separate them.
I am new to XML, I hope my question is clear.
If you have the current node you want to work on, then XmlNode defines a ParentNode FirstChild, and LastNode property that you can use to do this see.
Consider using XQuery or XPath in order to perform queries on your XML tree.
There's a nice tutorial here, showing all the common options.

Categories