Check all the children for XElement - c#

I have XElement object which is my XML tree read from XML file. Now I want to check all the nodes in this tree to get first attribute name and value. Is there any simple way to go through all of the nodes (from root till leaves)? My XML file has got very many different and strange nodes - that's why it's harder to solve this issue. I thought about writing some recursion, but hope it's another way to solve that easier.

Maybe take a look to Xpath. an XPath like this //*[#id=42] could do the job.
It means get all nodes which have an attribute "id" of value 42.
You can do just //* which gonna returns all nodes in a tree.
Xpath :
http://msdn.microsoft.com/en-gb/library/ms950786.aspx
Syntax :
http://msdn.microsoft.com/en-us/library/ms256471.aspx

You can get all children elements using XElement.Elements().
Here's some code using recursion to get all elements of each level:
void GetElements(XElement element){
var elements = element.Elements();
foreach(Element e in elements){
//some stuff here
if(e.Elements() != null)
GetElements(e);
}
}

Related

How do I delete XML nodes in VB.Net based on attribute values when I don't really know what the path will be?

I've been playing with LINQ in VB.Net and some other things in an attempt to delete XML nodes based on attribute values. Basically, if any node in my XML documents has an attribute of a particular value, "cats" for example, I want to delete it.
The catch is I won't really know exactly what the XML structures will look like, so I can't give a path. Also, I know some of the attributes that may contain "cats", but I don't want to hard code them if possible.
So, in other words, I don't have a set XML structure, and I want to delete ANY node that has "cats" as an attribute value, like Caption = "cats" or Title = "cats", anywhere in the node. If it has "cats", nuke it.
Is this at all possible? Or do I just need to give up on this project?
BTW, I'm trying to write the solution in VB.Net, but I am quite capable of reading and converting C# if someone happens to know how to accomplish this but can only give C# code.
Thanks a ton for any help!
You can do this using:
XDocument.Descendants() to iterate through all elements in your document.
XElement.Attributes() to loop through all attributes of an element, to see if any have a value of "cats".
Extensions.Remove() to remove all elements that have an attribute value that matches.
In c# this becomes:
var doc = XDocument.Parse(xmlString);
var attributeValue = "cats";
doc.Descendants().Where(e => e.Attributes().Any(a => (string)a == attributeValue)).Remove();
And in VB.NET:
Dim doc = XDocument.Parse(xmlString)
Dim attributeValue = "cats"
doc.Descendants().Where(Function(e) e.Attributes().Any(Function(a) CStr(a) = attributeValue)).Remove()
Example fiddle.

How to get a second or third XML node when using an anonymous type?

I'm using an anonymous type to grab some XML data. All was going well until I ran across a section of XML where there can be 2 or 3 similar nodes. Like in the XML sample below there are 3 separate "Phones". My code was working fine when there was only ONE element that was possible to grab after following the "element path" I led it to. How can i grab a specific one? Or all 3 for that matter? Handling XML is still new to me and there seems to be soo many ways of handling it Searching the web for my exact need here didn't prove successful. Thanks.
var nodes = from node in doc.Elements("ClaimsSvcRs").Elements("ClaimDownloadRs")
select new
{
Phone1 = (string)node.Elements("Communications").Elements("PhoneInfo").Elements("PhoneNumber").FirstOrDefault(),
Phone2 = (string)node.Elements("Communications").Elements("PhoneInfo").Elements("PhoneNumber").FirstOrDefault(),
};
The XML Code is
<?xml version="1.0" encoding="UTF-8"?>
<TEST>
<ClaimsSvcRs>
<ClaimDownloadRs>
<Communications>
<PhoneInfo>
<PhoneTypeCd>Phone</PhoneTypeCd>
<CommunicationUseCd>Home</CommunicationUseCd>
<PhoneNumber>+1-715-5553944</PhoneNumber>
</PhoneInfo>
<PhoneInfo>
<PhoneTypeCd>Phone</PhoneTypeCd>
<CommunicationUseCd>Business</CommunicationUseCd>
<PhoneNumber>+1-715-5552519</PhoneNumber>
</PhoneInfo>
<PhoneInfo>
<PhoneTypeCd>Phone</PhoneTypeCd>
<CommunicationUseCd>Cell</CommunicationUseCd>
<PhoneNumber>+1-715-5551212</PhoneNumber>
</PhoneInfo>
</Communications>
</ClaimDownloadRs>
</ClaimsSvcRs>
</TEST>
I haven't used xpath in a while so i'll let someone else stand in there... but there's a way to select a particular PhoneInfo object based upon its subelements. So if you knew whether you wanted Home or Business or Cell or whatever, you'd be able to select that particular PhoneInfo object. Otherwise if you wanted simple Phone1,2,3 and nulls where ok, use the Skip linq function. Phone2 = query.Skip(1).FirstOrDefault()
lol no worries ;) xpath can be intermixed in here, was my thought, and might be more elegant if your CommunicationUseCd fields were deterministic. Then you could have Home = ... and Work = ..., etc, instead of Phone1 & Phone2
The same could be accomplished by slipping a where clause into each your query lines
If you're up for LINQ you can get all your elements in one go:
foreach(XElement phone in XDocument.Parse(xmlString).Descendants("PhoneInfo"))
{
Console.WriteLine(phone.Element("PhoneNumber").Value);
//etc
}
I find XDocument & LINQ a lot easier than XmlDocument & XPath, if you're okay with the alternative. There's more info on them here

Remove an XElement from another XDocument

I need to remove an XElement from a XDocument.
The problem is i can't just use the .Remove() because my XDocument is not the same as the XElement.
A very important fact is performance.
Scenario: I have an XDocument docSource and I copy this to XDocument doc. I select a Node of docSource and want to delete this Node in my doc.
So far I'm using this workaround (which may also delete some wrong nodes if they got the same Parent Name but this doesn't matter so far):
private static XNode actualNode;
private static void RemoveNode(XDocument doc)
{
doc.Root.Descendants(((XElement)actualNode).Name.LocalName)
.Where(e => actualNode.Parent.Name.LocalName.Equals(e.Parent.Name.LocalName))
.Remove();
}
Is there a better way to do this? And especially a faster way?
My XDocument has like 1000 lines.
Well a better way of doing the existing name-based approach would be:
doc.Root.Descendants(actualNode.Parent.Name)
.Elements(actualNode.Name)
.Remove();
Aside from anything else, that's simpler - and doesn't use just the local name. (If the elements are actually in different namespaces, you should take account of that separately IMO.)
But this is still just using "element name and parent name" as a way of identifying an element. Do you have anything else which will identify the element more reliably? Some kind of attribute? I'm assuming you actually have some idea of what kind of element you'll be finding.
My XDocument has like 1000 lines.
Then it should be blink-of-an-eye quick anyway. Do you actually have any indication that this is causing a performance problem?
Another thing to consider:
Scenario: I have an XDocument docSource and I copy this to XDocument doc. I select a Node of docSource and want to delete this Node in my doc.
Is there any reason you don't just avoid copying the node to start with?
As you have rightly said, if you just rely on the Parent.Name.LocalName you may end up deleting incorrect child nodes when there are Parents with similar names.
If you validate for repeated parent nodes before deleting the child nodes you will be able to over come this issue.
You should be able to achieve accuracy by loading the nodes to an array/list. Then you will be able to find the position of the exact parent node. But I am afraid it will not improve the performance.
For an example you have 3 parent nodes with 'XZY'.
User selects the 2 parent node. So your parent index will be 1(assuming the index starts with 0)
So you should only delete the children under parent index 1.
Hope this helps.

Order HtmlNodes Based on their position on the HTML Page (C# / XPath)

Context:
I am parsing the result of a Query on this service, but the HTML with the result is a mess.
My goal is the build a "KeyValue" pair with each "attribute and value" shown as result of this query.
At the moment only one way came into my mind to solve it.
Logic for Parsing:
Select all the Attribute nodes
Select all the value nodes
Match their "indexes" on each collection built to build the Key Value Pairs
E.g: Attribute[0] with Value[0] -> (In this service, that would be "CNPJ" and "12.272.084/0001-00").
Problem:
Even tho i managed to find a XPath expression to fetch all the attributes nodes:
attrNodes = htmlDoc.DocumentNode.SelectNodes ("//td[#bgcolor='#f1f1b1']/*/font[#face='Verdana']");
I could not manage to find one for the value nodes aswell, since there are different types of nodes that actually look the same when rendered by Html ( "b" and "strong" for example).
There are even nodes with different hierarquies that prevented me from using Wildcards ("*") on XPath to solve it (single tag or two tags nested for example)
My Goal:
Write XPaths to reach each different subset of nodes with values
Put all the nodes in a single Collection
Order the nodes of this Collection based on the position of each node in the Html (nodes that appear first on the HTML will be on the begining of the list)
Any idea of how can i achieve my goal ?
HTML Sample:
You can either give it a check here
or Query yourself the service by typing : 12272084000100 on the CNPJ textbox
and clicking on "Pesquisar". After that, you just have to click on the text "Companhia Eletrica de Alagoas"
Thanks in Advance
I just found an Attribute that can be found on the "HtmlNode" class of the HtmlAgilityPack Framework that managed to solve my problem.
According to this documentation about the HtmlNode Class:
StreamPosition
Gets the stream position of this node in the document, relative to the start of the document.
Here is the output of my tests using a list of tables found in this very same Html Page (tables used for testing purposes)
// HtmlNodeCollection of Tables
tableNodes[0].StreamPosition
925
tableNodes[1].StreamPosition
1651
tableNodes[2].StreamPosition
2387
Ordering my list using this StreamPosition as parameter managed to solve my problem.
List<HtmlNode> OrderedList = valueNodes.OrderBy ( node => node.StreamPosition ).ToList<HtmlNode>();

XML ancestors wanted

In C#, I need to get
currentnode.parentnode.parentnode.parentnode.firstchild.lastchild.lastchild
I am using to generating MLM tree some of the label which represent individual node overload at the fourth level so I was trying to get that nodes and separate them.
I am new to XML, I hope my question is clear.
If you have the current node you want to work on, then XmlNode defines a ParentNode FirstChild, and LastNode property that you can use to do this see.
Consider using XQuery or XPath in order to perform queries on your XML tree.
There's a nice tutorial here, showing all the common options.

Categories