We have C# code that walks various XML documents that we create. Often we need to get a known child element (it can be the only child or there could be other siblings). I have a function that given a parent and the child name will return the child element:
public static XmlElement GetChildElement(XmlElement parentElement, string childName)
{
return parentElement.GetElementsByTagName(childName).Cast<XmlElement>().FirstOrDefault();
}
This works fine but the other day I wondered if it could be done cleaner and easier with XPath or LINQ to XML. Most of the XPath examples I have found seem to want to know the entire structure of the document and I want a generic function that just knows about the parent and child. Linq to XML seems more promising but I haven't found an example matching what I am looking for.
Well LINQ to XML makes this very easy - you just use the XContainer.Element method:
XElement child = parent.Element(elementName);
This will give you the first element if there are any, or null otherwise.
Given what you already have, you can just do this:
public static XmlElement GetChildElement(XmlElement parentElement, string childName)
{
return parentElement[childName];
}
This will return the first matching child element, or null if there is none. Heck, I'm not sure there's even much sense using a convenience method for this, but the above modification will work if you already have references to this method.
One thing to note here is that the code you provided doesn't return the first matching child element; it returns the first matching descendant element. If that is in fact what you want, you can do this:
public static XmlElement GetChildElement(XmlElement parentElement, string childName)
{
return parentElement.SelectSingleNode("//" + childName) as XmlElement;
}
XmlNode.SelectSingleNode is the method you looking for if you can't use XElement:
var result = parentElement.SelectSingleNode(
string.Format("*[local-name()='{0}']", nameWithoutPrefix));
Note that my sample cheats with namespaces (accepts any), you should understand if you need to support namespaces correctly in your case.
Related
Background:
Using Roslyn with C#, I am trying to expand auto-implemented properties, so that the accessor bodies can have code injected by later processing. I am using StackExchange.Precompilation as the compiler hook, so these syntax transformations occur in the build pipeline, not as part of an analyzer or refactoring.
I want to turn this:
[SpecialAttribute]
int AutoImplemented { get; set; }
into this:
[SpecialAttribute]
int AutoImplemented {
get { return _autoImplemented; }
set { _autoImplemented = value; }
}
private int _autoImplemented;
The problem:
I have been able to get simple transformations working, but I'm stuck on auto-properties, and a few others that are similar in some ways. The trouble I'm having is in using the SyntaxNodeExtensions.ReplaceNode and SyntaxNodeExtensions.ReplaceNodes extension methods correctly when replacing more than one node in a tree.
I am using a class extending CSharpSyntaxRewriter for the transformations. I'll just share the relevant members of that class here. This class visits each class and struct declaration, and then replaces any property declarations that are marked with SpecialAttribute.
private readonly SemanticModel model;
public override SyntaxNode VisitClassDeclaration(ClassDeclarationSyntax node) {
if (node == null) throw new ArgumentNullException(nameof(node));
node = VisitMembers(node);
return base.VisitClassDeclaration(node);
}
public override SyntaxNode VisitStructDeclaration(StructDeclarationSyntax node) {
if (node == null) throw new ArgumentNullException(nameof(node));
node = VisitMembers(node);
return base.VisitStructDeclaration(node);
}
private TNode VisitMembers<TNode>(TNode node)
where TNode : SyntaxNode {
IEnumerable<PropertyDeclarationSyntax> markedProperties =
node.DescendantNodes()
.OfType<PropertyDeclarationSyntax>()
.Where(prop => prop.HasAttribute<SpecialAttribute>(model));
foreach (var prop in markedProperties) {
SyntaxList<SyntaxNode> expanded = ExpandProperty(prop);
//If I set a breakpoint here, I can see that 'expanded' will hold the correct value.
//ReplaceNode appears to not be replacing anything
node = node.ReplaceNode(prop, expanded);
}
return node;
}
private SyntaxList<SyntaxNode> ExpandProperty(PropertyDeclarationSyntax node) {
//Generates list of new syntax elements from original.
//This method will produce correct output.
}
HasAttribute<TAttribute> is an extension method I defined for PropertyDeclarationSyntax that checks if that property has an attribute of the given type. This method works correctly.
I believe I am just not using ReplaceNode correctly. There are three related methods:
TRoot ReplaceNode<TRoot>(
TRoot root,
SyntaxNode oldNode,
SyntaxNode newNode);
TRoot ReplaceNode<TRoot>(
TRoot root,
SyntaxNode oldNode,
IEnumerable<SyntaxNode> newNodes);
TRoot ReplaceNodes<TRoot, TNode>(
TRoot root,
IEnumerable<TNode> nodes,
Func<TNode, TNode, SyntaxNode> computeReplacementNode);
I am using the second one, because I need to replace each property node with both field and property nodes. I need to do this with many nodes, but there is no overload of ReplaceNodes that allows one-to-many node replacement. The only way I found around having that overload was using a foreach loop, which seems very 'imperative' and against the functional feel of the Roslyn API.
Is there a better way to perform batch transformations like this?
Update:
I found a great blog series on Roslyn and dealing with its immutability. I haven't found the exact answer yet, but it looks like a good place to start.
https://joshvarty.wordpress.com/learn-roslyn-now/
Update:
So here is where I'm really confused. I know that the Roslyn API is all based on immutable data structures, and the problem here is in a subtlety of how the copying of structures is used to mimic mutability. I think the problem is that every time I replace a node in my tree, I then have a new tree, and so when I call ReplaceNode that tree supposedly doesn't contain my original node that I want to replace.
It is my understanding that the way trees are copied in Roslyn is that, when you replace a node in a tree you actually create a new tree that references all the same nodes of the original tree, except the node you replaced and all nodes directly above that one. The nodes below the replaced node may be removed if the replacement node no longer references them, or new references may be added, but all the old references still point to the same node instances as before. I am pretty sure this is exactly what Anders Hejlsberg describes in this interview on Roslyn (20 to 23 min in).
So shouldn't my new node instance still contain the same prop instances found in my original sequence?
Hacky solution for special cases:
I was finally able to get this particular problem of transforming property declarations to work by relying on property identifiers, which will not change in any tree transformations. However, I would still like a general solution for replacing multiple nodes with multiple nodes each. This solution is really working around the API not through it.
Here is the special case solution:
private TNode VisitMembers<TNode>(TNode node)
where TNode : SyntaxNode {
IEnumerable<PropertyDeclarationSyntax> markedPropertyNames =
node.DescendantNodes()
.OfType<PropertyDeclarationSyntax>()
.Where(prop => prop.HasAttribute<SpecialAttribute>(model))
.Select(prop => prop.Identifier.ValueText);
foreach (var prop in markedPropertyNames) {
var oldProp = node.DescendantNodes()
.OfType<PropertyDeclarationSyntax>()
.Single(p => p.Identifier.ValueText == prop.Name);
SyntaxList<SyntaxNode> newProp = ExpandProperty(oldProp);
node = node.ReplaceNode(oldProp, newProp);
}
return node;
}
Another similar problem I am working with is modifying all return statements in a method to insert postcondition checks. This case cannot obviously rely on any kind of unique identifier like a property declaration.
When you do that:
foreach (var prop in markedProperties) {
SyntaxList<SyntaxNode> expanded = ExpandProperty(prop);
//If I set a breakpoint here, I can see that 'expanded' will hold the correct value.
//ReplaceNode appears to not be replacing anything
node = node.ReplaceNode(prop, expanded);
}
After the first replacing, node (your class for example) does not contains the original property anymore .
In Roslyn, everything is immutable, so the first replace should work for you, and the you have a new tree\node.
To make it work you can consider one of the following:
Build the result in your rewriter class, without changing the original tree, and when you finishing, replace all at once. In your case, its mean replace the class note at once. I think its good option when you want to replace statement (I used it when I wrote code to convert linq query (comprehension) to fluent syntax) but for all class, maybe it's not optimal.
Use SyntaxAnnotaion \ TrackNodes to find node after the tree has changed. With these options you can change the tree as you want and you can still keep track of the old nodes in the new tree.
Use DocumentEditor its let you do multiple changes to a document and then return a new Document.
If you need example for one of them, let me know.
I have one query over linq, suppose i have an xml response ( i am getting from some x server) each time this xml response will change but root element of xml is same but some times descendant nodes (like inside it have many different nodes) may exists or not, so i need to build linq query over this xml so that there will be no exceptions throws saying some x1 element or child node not exists..So finally it should take the xml (what come's from server response) and build the query.
Regards,
Raj.
I had this in the past: xml's coming in with each time totally different structures.
So I build something that first analyzed the structure and stored this structure in database tables, to be able to also know the delta's and have some history (and diff-ing) on it. Since sometimes manual intervention is needed (can't predict everything).
After the analysis phase, then a query can be done on the new structure based on the structure analysis.
So I took a two step approach, maybe this is also applicable for you.
First, you need to convert your XML to a dynamic structure. The first library I found (haven't tried though): https://www.nuget.org/packages/netfx-System.Xml.DynamicXml/
Then, you can add Linq on top of your dynamic XML root.
However, performance-wise this is really bad approach. I played once with dynamic XML and that was extremely slow. Maybe, instead of making it dynamic you can create extension methods to XElement (if you are loading the XML to XDocument) which will return empty node without throwing an exception, e.g. (pseudocode):
public static class XElementExtensions
{
public static XElement SafeGetChild(this XElement node, string childName)
{
XElement result;
if (!node.TryGetChildByName(childName, out result))
result = node.Document.CreateElement(childName);
return result;
}
}
I implemented a tree structure in c# where a node looks like the following
public class Node
{
public int ID{get;set;}
public string Name{get;set;}
public Node Parent {get;set;}
public IList<Node> Children{get;set;}
public IList<Object> Items{get;set;}
public IEnumerable<Ancestors> {get{return this.GetAncestors();}}
}
I want to improve my structure but i am not sure what is this kind of tree is called, its not a binary tree since the children count varies and can be more than 2, i use recursion for almost every operation from getting a node by Name,Id or reference to removing nodes, in my case when a node is removed i add both the Items and Children Properties to the Parent node.
I did it from scratch and i am sure someone did it better, so could you please help me figure the name of this tree structure so i can google it for improvements?
k-ary tree is probably the closest to what you're looking for. This typically refers to a tree where each node has at most k children (for some k, e.g. a binary tree is a 2-ary tree).
If you're looking for the case where the number of children per node is unbounded, I don't believe that has a specific name, it's just called a tree (although I imagine some resources might call that a k-ary tree as well).
An obvious place for improvement I see here is to use generics for your structure (you should replace IList<Object> with a generic data type, and rename Items to Data ... probably).
Without knowing what you want to do, I can't say whether IList<Object> is a good idea - an alternative might be to have a class with members with specific types instead, or IList<SomeOtherType>.
Having each node store a reference to its parent is not that typical, but if there's a need for it, it can be done.
There are a few places where these structures are also called n-ary trees . If you want examples , you can google for Tries and B-tree.
I think a trie comes closest to what you are trying to structure
Given a very simple structure such as this:
public class TreeNode
{
public int ID { get; set; }
public List<TreeNode> Children { get; set; }
}
TreeNode may have other properties.
And when used in the following manner:
var tree = new List<TreeNode>(); //no root node
If I perform add/update/remove operations on the tree based on certain criteria. For example, removal of a node based on one or more of the other properties I mentioned above, I'd like to compare the tree graph before and after the changes and then via unit tests verify some of the follow:
Tree remains unchanged
Specified nodes are removed
Specified nodes are added
Specified nodes are updated
The 3 above whilst also verifying that the rest of the tree is unchanged.
Ideally, I'd throw an expection listing the nodes that were not found, not expected etc. However, at this stage I'd be happy with a true/false to my check.
Are there any known patterns/alogorithms existing projects that would help with this?
I am happy for pseudo-code or examples in other languages as long as they don't rely on features I can't replicate in .NET.
My tree is unlikely to get to more than 7 or 8 levels deep and no more than a hundred nodes in total as it will be test data so brute force looping is fine and performance isn't a consideration at this time.
I'm really looking for tips, tricks, advice, code on how to approach this.
TIA
When I did unit tests for tree structures, I simply built an ad-hoc tree of already known structure, execute operations on it and verified that the changes are exactly the ones I expected, a very simple but usable method, if you create good test cases.
Regardless my experience, you may think of some recursive comparison methods for tree nodes that may return a list of children nodes which are different. So the basic idea is to maintain two equal trees, perform operation on one of them, then check what was changed.
If you don't have any UI that shows the tree, I'd also recommend to make visualizations of a tree, using http://www.graphviz.org/ , you may generate pictures of your tree before and after some operation, so you will see how whole structure was changed(not usable for unit tests, but anyway).
And the last thing, I suggest to have a root node, it will simplify your recursive algorithms. If you don't have root, because of some requirments for UI or so, you may modify that part to simply ignore the root.
You can also have a function that get the string representation of the tree and simply compare 2 string representations instead of comparing 2 trees
I did that earlier this week
example function (swift)
public var description: String {
var s = "\(value)"
if !children.isEmpty {
s += " {" + children.map { "\($0.description)"}.joined(separator: ", ") + "}"
}
return s
}
You can test it like this
XCTAssert ( tree.description == "beverages {hot {tea {black, green, chai}, coffee, cocoa}, cold {soda {ginger ale, bitter lemon}, milk}}");
Sometimes I'd like to know the reasoning of certain API changes. Since Google hasn't helped me with this question, maybe StackOverflow can. Why did Microsoft choose to remove the GetAttribute helper method on XML elements? In the System.Xml world there was XmlElement.GetAttribute("x") like getAttribute in MSXML before it, both of which return either the attribute value or an empty string when missing. With XElement there's SetAttributeValue but GetAttributeValue wasn't implemented.
Certainly it's not too much work to modify logic to test and use the XElement.Attribute("x").Value property but it's not as convenient and providing the utility function one way (SetAttributeValue) but not the other seems weird. Does anyone out there know the reasons behind the decision so that I can rest easily and maybe learn something from it?
You are supposed to get attribute value like this:
var value = (TYPE) element.Attribute("x");
UPDATE:
Examples:
var value = (string) element.Attribute("x");
var value = (int) element.Attribute("x");
etc.
See this article: http://www.hanselman.com/blog/ImprovingLINQCodeSmellWithExplicitAndImplicitConversionOperators.aspx. Same thing works for attributes.
Not sure exactly the reason, but with C# extension methods, you can solve the problem yourself.
public static string GetAttributeValue(this XElement element, XName name)
{
var attribute = element.Attribute(name);
return attribute != null ? attribute.Value : null;
}
Allows:
element.GetAttributeValue("myAttributeName");