Possible NullReferenceException - c#

For example this snippet of code:
private void LoadComments(Member member, XElement commentsXml)
{
member.Comments = (from comment in commentsXml.Descendants("comment")
select new Comment()
{
ID = comment.Element("id").Value,
Text = comment.Element("text").Value,
Date = comment.Element("date").Value,
Owner = comment.Element("user").Element("name").Value
}).ToList();
}
ReSharper warns me of a possible NullReferenceException on the comment.Element lines. True enough, the exception fired.
Any suggestion of how to avoid this? What about if it returns null, just return an empty string, is this possible?

I Prefer to have an extension class for it:
public static class XElementExtension
{
public static string GetValue(this XElement input)
{
if (input == null)
return null;
return input.Value as T;
}
public static XElement GetSubElement(this XElement element, string id)
{
if (element == null)
return null;
return element.Element(id);
}
}
and use it as :
ID = comment.Element("id").GetValue()
or
Owner = comment.Element("user").GetSubElement("name").GetValue()
Also there are other ways like:
http://www.codeproject.com/KB/cs/maybemonads.aspx

A few extensions could help:
Example:
public static class XElementExtensions
{
public static XElement GetElement(this XElement element, XName elementName)
{
if (element != null)
{
XElement child = element.Element(elementName);
if (child != null)
{
return child;
}
}
return null;
}
public static String GetElementValue(this XElement element, XName elementName)
{
if (element != null)
{
XElement child = element.Element(elementName);
if (child != null)
{
return child.Value;
}
}
return null;
}
}
Usage:
private void LoadComments(Member member, XElement commentsXml)
{
member.Comments = (from comment in commentsXml.Descendants("comment")
select new Comment()
{
ID = comment.GetElementValue("id"),
Text = comment.GetElementValue("text"),
Date = comment.GetElementValue("date"),
Owner = comment.GetElement("user").GetElementValue("name")
}).ToList();
}

I can only think you'd need to check each element reference thus:
private void LoadComments(Member member, XElement commentsXml)
{
member.Comments = (from comment in commentsXml.Descendants("comment")
select new Comment()
{
ID = (comment.Element("id")==null)?"":comment.Element("id").Value,
Text = (comment.Element("text")==null)?"":comment.Element("text").Value,
Date = (comment.Element("date")==null)?"":comment.Element("date").Value,
Owner = (comment.Element("user")==null)?"":comment.Element("user").Element("name").Value
}).ToList();
}
The last is a little weak as there's two nodes that really should be checked but the nesting would look a little unpleasant but the only way to be sure.
EDIT ----
As an extension method:
public static class MyExtensions
{
public static string ValueSafe(this XElement target)
{
if (target==null)
{ return ""; }
else
{ return target.Value; }
}
}
You can then replace .Value with .ValueSafe, that said, not had an opportunity to test.

It is possible - you would need to wrap each comment.Element().Value statement in a function.
I like to use:
public string UnNull(object v)
{
if (v == null) return "";
return v.ToString();
}
As for the last line, you need to be extra carful there to make sure comment.Element("user") is not null and handle that case if it is.

Related

Printing Tree and skipping certain values

here i have the following code and input that prints a tree structure. My question is how can i make it so that the nodes and leafs that have the value "Unavailable" are skipped from being printed.
namespace Tree{public class TreeNode<T>
{
private T value;
private bool hasParent;
private List<TreeNode<T>> children;
public TreeNode(T value)
{
if (value == null)
{
throw new ArgumentNullException("Cannot insert null value");
}
this.value = value;
this.children = new List<TreeNode<T>>();
}
public T Value
{
get
{
return this.value;
}
set
{
this.value = value;
}
}
public int ChildrenCount
{
get
{
return this.children.Count;
}
}
public void AddChild(TreeNode<T> child)
{
if (child == null)
{
throw new ArgumentNullException("Cannot insert null value");
}
if (child.hasParent)
{
throw new ArgumentException("The node already has a parent");
}
child.hasParent = true;
this.children.Add(child);
}
public TreeNode<T> GetChild(int index)
{
return this.children[index];
}
}
public class Tree<T>
{
private TreeNode<T> root;
public Tree(T value)
{
if (value == null)
{
throw new ArgumentNullException("Cannot insert null value");
}
this.root = new TreeNode<T>(value);
}
public Tree(T value, params Tree<T>[] children) : this(value)
{
foreach (Tree<T> child in children)
{
this.root.AddChild(child.root);
}
}
public TreeNode<T> Root
{
get
{
return this.root;
}
}
private void PrintDFS(TreeNode<T> root, string spaces)
{
if (this.root == null)
{
return;
}
Console.WriteLine(spaces + root.Value);
TreeNode<T> child = null;
for (int i = 0; i < root.ChildrenCount; i++)
{
child = root.GetChild(i);
PrintDFS(child, spaces + " ");
}
}
public void TraverseDFS()
{
this.PrintDFS(this.root, string.Empty);
}
}
public static class TreeExample
{
static void Main()
{
Tree<string> tree =
new Tree<string>("John",
new Tree<string>("Jasmine",
new Tree<string>("Jay"),
new Tree<string>("Unavailable")),
new Tree<string>("Unavailable",
new Tree<string>("Jack"),
new Tree<string>("Jeremy")),
new Tree<string>("Johanna")
);
tree.TraverseDFS();
}
}}
right now it prints :(John, (Jasmine, (Jay), (Unavailable)), (Unavailable, (Jack, (Jeremy))), (Johanna))
I need it to print :(John, (Jasmine, (Jay)), (Johanna))
So basically skip every leaf with the value "Unavailable" and every node with the value "Unavailable" and all children from that node
Thanks !
This should work:
private void PrintDFS(TreeNode<T> root, string spaces)
{
if (this.root == null
|| "Unavailable" == root.Value.ToString())
{
return;
}
...
The accepted answer is a literally correct answer to the question, but it bakes in logic about what to do with the tree into the tree itself. A tree is a kind of collection or data structure, and you don't often see a List or Dictionary that is able to print itself. Instead the collection provides the right methods to get or change its contents so that you can do what you want.
In your case, you could do something like the following:
public enum TreeVisitorResult {
SkipNode,
Continue
}
// the following two methods inside Tree<T>:
public void VisitNodes(Func<TreeNode<T>, int, TreeVisitorResult> visitor) {
VisitNodes(0, this.root, visitor);
}
private void VisitNodes(int depth, TreeNode<T> node,
Func<TreeNode<T>, int, TreeVisitorResult> visitor) {
if (node == null) {
return;
}
var shouldSkip = visitor(node, depth);
if (shouldSkip == TreeVisitorResult.SkipNode) {
return;
}
TreeNode<T> child = null;
for (int i = 0; i < node.ChildrenCount; i++) {
child = node.GetChild(i);
VisitNodes(depth + 1, child, visitor);
}
}
If you had this method, you could write the Print method outside of the Tree classes, as:
tree.VisitNodes((treeNode, depth) => { // <- this lambda will be called for every node
if (treeNode.Value == "Unavailable") { // <- no need to ToString or cast here, since
// we know that T is string here
return TreeVisitorResult.SkipNode;
} else {
var spaces = new string(' ', depth * 3);
Console.WriteLine(spaces + treeNode.Value);
}
});

How to remove schema reference in each elements from the xml using C# [duplicate]

I am looking for the clean, elegant and smart solution to remove namespacees from all XML elements? How would function to do that look like?
Defined interface:
public interface IXMLUtils
{
string RemoveAllNamespaces(string xmlDocument);
}
Sample XML to remove NS from:
<?xml version="1.0" encoding="utf-16"?>
<ArrayOfInserts xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<insert>
<offer xmlns="http://schema.peters.com/doc_353/1/Types">0174587</offer>
<type2 xmlns="http://schema.peters.com/doc_353/1/Types">014717</type2>
<supplier xmlns="http://schema.peters.com/doc_353/1/Types">019172</supplier>
<id_frame xmlns="http://schema.peters.com/doc_353/1/Types" />
<type3 xmlns="http://schema.peters.com/doc_353/1/Types">
<type2 />
<main>false</main>
</type3>
<status xmlns="http://schema.peters.com/doc_353/1/Types">Some state</status>
</insert>
</ArrayOfInserts>
After we call RemoveAllNamespaces(xmlWithLotOfNs), we should get:
<?xml version="1.0" encoding="utf-16"?>
<ArrayOfInserts>
<insert>
<offer >0174587</offer>
<type2 >014717</type2>
<supplier >019172</supplier>
<id_frame />
<type3 >
<type2 />
<main>false</main>
</type3>
<status >Some state</status>
</insert>
</ArrayOfInserts>
Preffered language of solution is C# on .NET 3.5 SP1.
Well, here is the final answer. I have used great Jimmy idea (which unfortunately is not complete itself) and complete recursion function to work properly.
Based on interface:
string RemoveAllNamespaces(string xmlDocument);
I represent here final clean and universal C# solution for removing XML namespaces:
//Implemented based on interface, not part of algorithm
public static string RemoveAllNamespaces(string xmlDocument)
{
XElement xmlDocumentWithoutNs = RemoveAllNamespaces(XElement.Parse(xmlDocument));
return xmlDocumentWithoutNs.ToString();
}
//Core recursion function
private static XElement RemoveAllNamespaces(XElement xmlDocument)
{
if (!xmlDocument.HasElements)
{
XElement xElement = new XElement(xmlDocument.Name.LocalName);
xElement.Value = xmlDocument.Value;
foreach (XAttribute attribute in xmlDocument.Attributes())
xElement.Add(attribute);
return xElement;
}
return new XElement(xmlDocument.Name.LocalName, xmlDocument.Elements().Select(el => RemoveAllNamespaces(el)));
}
It's working 100%, but I have not tested it much so it may not cover some special cases... But it is good base to start.
The tagged most useful answer has two flaws:
It ignores attributes
It doesn't work with "mixed mode" elements
Here is my take on this:
public static XElement RemoveAllNamespaces(XElement e)
{
return new XElement(e.Name.LocalName,
(from n in e.Nodes()
select ((n is XElement) ? RemoveAllNamespaces(n as XElement) : n)),
(e.HasAttributes) ?
(from a in e.Attributes()
where (!a.IsNamespaceDeclaration)
select new XAttribute(a.Name.LocalName, a.Value)) : null);
}
Sample code here.
That will do the trick :-)
foreach (XElement XE in Xml.DescendantsAndSelf())
{
// Stripping the namespace by setting the name of the element to it's localname only
XE.Name = XE.Name.LocalName;
// replacing all attributes with attributes that are not namespaces and their names are set to only the localname
XE.ReplaceAttributes((from xattrib in XE.Attributes().Where(xa => !xa.IsNamespaceDeclaration) select new XAttribute(xattrib.Name.LocalName, xattrib.Value)));
}
the obligatory answer using LINQ:
static XElement stripNS(XElement root) {
return new XElement(
root.Name.LocalName,
root.HasElements ?
root.Elements().Select(el => stripNS(el)) :
(object)root.Value
);
}
static void Main() {
var xml = XElement.Parse(#"<?xml version=""1.0"" encoding=""utf-16""?>
<ArrayOfInserts xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema"">
<insert>
<offer xmlns=""http://schema.peters.com/doc_353/1/Types"">0174587</offer>
<type2 xmlns=""http://schema.peters.com/doc_353/1/Types"">014717</type2>
<supplier xmlns=""http://schema.peters.com/doc_353/1/Types"">019172</supplier>
<id_frame xmlns=""http://schema.peters.com/doc_353/1/Types"" />
<type3 xmlns=""http://schema.peters.com/doc_353/1/Types"">
<type2 />
<main>false</main>
</type3>
<status xmlns=""http://schema.peters.com/doc_353/1/Types"">Some state</status>
</insert>
</ArrayOfInserts>");
Console.WriteLine(stripNS(xml));
}
Pick it up again, in C# - added line for copying the attributes:
static XElement stripNS(XElement root)
{
XElement res = new XElement(
root.Name.LocalName,
root.HasElements ?
root.Elements().Select(el => stripNS(el)) :
(object)root.Value
);
res.ReplaceAttributes(
root.Attributes().Where(attr => (!attr.IsNamespaceDeclaration)));
return res;
}
The obligatory answer using XSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="no" encoding="UTF-8"/>
<xsl:template match="/|comment()|processing-instruction()">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="#*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="#*">
<xsl:attribute name="{local-name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
I know this question is supposedly solved, but I wasn't totally happy with the way it was implemented. I found another source over here on the MSDN blogs that has an overridden XmlTextWriter class that strips out the namespaces. I tweaked it a bit to get some other things I wanted in such as pretty formatting and preserving the root element. Here is what I have in my project at the moment.
http://blogs.msdn.com/b/kaevans/archive/2004/08/02/206432.aspx
Class
/// <summary>
/// Modified XML writer that writes (almost) no namespaces out with pretty formatting
/// </summary>
/// <seealso cref="http://blogs.msdn.com/b/kaevans/archive/2004/08/02/206432.aspx"/>
public class XmlNoNamespaceWriter : XmlTextWriter
{
private bool _SkipAttribute = false;
private int _EncounteredNamespaceCount = 0;
public XmlNoNamespaceWriter(TextWriter writer)
: base(writer)
{
this.Formatting = System.Xml.Formatting.Indented;
}
public override void WriteStartElement(string prefix, string localName, string ns)
{
base.WriteStartElement(null, localName, null);
}
public override void WriteStartAttribute(string prefix, string localName, string ns)
{
//If the prefix or localname are "xmlns", don't write it.
//HOWEVER... if the 1st element (root?) has a namespace we will write it.
if ((prefix.CompareTo("xmlns") == 0
|| localName.CompareTo("xmlns") == 0)
&& _EncounteredNamespaceCount++ > 0)
{
_SkipAttribute = true;
}
else
{
base.WriteStartAttribute(null, localName, null);
}
}
public override void WriteString(string text)
{
//If we are writing an attribute, the text for the xmlns
//or xmlns:prefix declaration would occur here. Skip
//it if this is the case.
if (!_SkipAttribute)
{
base.WriteString(text);
}
}
public override void WriteEndAttribute()
{
//If we skipped the WriteStartAttribute call, we have to
//skip the WriteEndAttribute call as well or else the XmlWriter
//will have an invalid state.
if (!_SkipAttribute)
{
base.WriteEndAttribute();
}
//reset the boolean for the next attribute.
_SkipAttribute = false;
}
public override void WriteQualifiedName(string localName, string ns)
{
//Always write the qualified name using only the
//localname.
base.WriteQualifiedName(localName, null);
}
}
Usage
//Save the updated document using our modified (almost) no-namespace XML writer
using(StreamWriter sw = new StreamWriter(this.XmlDocumentPath))
using(XmlNoNamespaceWriter xw = new XmlNoNamespaceWriter(sw))
{
//This variable is of type `XmlDocument`
this.XmlDocumentRoot.Save(xw);
}
And this is the perfect solution that will also remove XSI elements.
(If you remove the xmlns and don't remove XSI, .Net shouts at you...)
string xml = node.OuterXml;
//Regex below finds strings that start with xmlns, may or may not have :and some text, then continue with =
//and ", have a streach of text that does not contain quotes and end with ". similar, will happen to an attribute
// that starts with xsi.
string strXMLPattern = #"xmlns(:\w+)?=""([^""]+)""|xsi(:\w+)?=""([^""]+)""";
xml = Regex.Replace(xml, strXMLPattern, "");
This is a solution based on Peter Stegnar's accepted answer.
I used it, but (as andygjp and John Saunders remarked) his code ignores attributes.
I needed to take care of attributes too, so I adapted his code. Andy's version was Visual Basic, this is still c#.
I know it's been a while, but perhaps it'll save somebody some time one day.
private static XElement RemoveAllNamespaces(XElement xmlDocument)
{
XElement xmlDocumentWithoutNs = removeAllNamespaces(xmlDocument);
return xmlDocumentWithoutNs;
}
private static XElement removeAllNamespaces(XElement xmlDocument)
{
var stripped = new XElement(xmlDocument.Name.LocalName);
foreach (var attribute in
xmlDocument.Attributes().Where(
attribute =>
!attribute.IsNamespaceDeclaration &&
String.IsNullOrEmpty(attribute.Name.NamespaceName)))
{
stripped.Add(new XAttribute(attribute.Name.LocalName, attribute.Value));
}
if (!xmlDocument.HasElements)
{
stripped.Value = xmlDocument.Value;
return stripped;
}
stripped.Add(xmlDocument.Elements().Select(
el =>
RemoveAllNamespaces(el)));
return stripped;
}
I really liked where Dexter is going up there so I translated it into a “fluent” extension method:
/// <summary>
/// Returns the specified <see cref="XElement"/>
/// without namespace qualifiers on elements and attributes.
/// </summary>
/// <param name="element">The element</param>
public static XElement WithoutNamespaces(this XElement element)
{
if (element == null) return null;
#region delegates:
Func<XNode, XNode> getChildNode = e => (e.NodeType == XmlNodeType.Element) ? (e as XElement).WithoutNamespaces() : e;
Func<XElement, IEnumerable<XAttribute>> getAttributes = e => (e.HasAttributes) ?
e.Attributes()
.Where(a => !a.IsNamespaceDeclaration)
.Select(a => new XAttribute(a.Name.LocalName, a.Value))
:
Enumerable.Empty<XAttribute>();
#endregion
return new XElement(element.Name.LocalName,
element.Nodes().Select(getChildNode),
getAttributes(element));
}
The “fluent” approach allows me to do this:
var xml = File.ReadAllText(presentationFile);
var xDoc = XDocument.Parse(xml);
var xRoot = xDoc.Root.WithoutNamespaces();
You can do that using Linq:
public static string RemoveAllNamespaces(string xmlDocument)
{
var xml = XElement.Parse(xmlDocument);
xml.Descendants().Select(o => o.Name = o.Name.LocalName).ToArray();
return xml.ToString();
}
Slightly modified Peter's answer, this would works fine for the attribute as well, including remove the namespace and prefix. A bit sorry for the code looks a bit ugly.
private static XElement RemoveAllNamespaces(XElement xmlDocument)
{
if (!xmlDocument.HasElements)
{
XElement xElement = new XElement(xmlDocument.Name.LocalName);
xElement.Value = xmlDocument.Value;
foreach (XAttribute attribute in xmlDocument.Attributes())
{
xElement.Add(new XAttribute(attribute.Name.LocalName, attribute.Value));
}
return xElement;
}
else
{
XElement xElement = new XElement(xmlDocument.Name.LocalName, xmlDocument.Elements().Select(el => RemoveAllNamespaces(el)));
foreach (XAttribute attribute in xmlDocument.Attributes())
{
xElement.Add(new XAttribute(attribute.Name.LocalName, attribute.Value));
}
return xElement;
}
}
Bit late to the party on this one but here's what I used recently:
var doc = XDocument.Parse(xmlString);
doc.Root.DescendantNodesAndSelf().OfType<XElement>().Attributes().Where(att => att.IsNamespaceDeclaration).Remove();
(taken from this MSDN Thread)
Edit As per the comment below, it appears that while this removes the namespace prefix from the nodes it doesn't actually remove the xmlns attribute. To do that you need to also reset the name of each node to it's localname (eg name minus namespace)
foreach (var node in doc.Root.DescendantNodesAndSelf().OfType<XElement>())
{
node.Name = node.Name.LocalName;
}
Here's are Regex Replace one liner:
public static string RemoveNamespaces(this string xml)
{
return Regex.Replace(xml, "((?<=<|<\\/)|(?<= ))[A-Za-z0-9]+:| xmlns(:[A-Za-z0-9]+)?=\".*?\"", "");
}
Here's a sample:
https://regex101.com/r/fopydN/6
Warning:there might be edge cases!
The reply's by Jimmy and Peter were a great help, but they actually removed all attributes, so I made a slight modification:
Imports System.Runtime.CompilerServices
Friend Module XElementExtensions
<Extension()> _
Public Function RemoveAllNamespaces(ByVal element As XElement) As XElement
If element.HasElements Then
Dim cleanElement = RemoveAllNamespaces(New XElement(element.Name.LocalName, element.Attributes))
cleanElement.Add(element.Elements.Select(Function(el) RemoveAllNamespaces(el)))
Return cleanElement
Else
Dim allAttributesExceptNamespaces = element.Attributes.Where(Function(attr) Not attr.IsNamespaceDeclaration)
element.ReplaceAttributes(allAttributesExceptNamespaces)
Return element
End If
End Function
End Module
For attributes to work the for loop for adding attribute should go after recursion, also need to check if IsNamespaceDeclaration:
private static XElement RemoveAllNamespaces(XElement xmlDocument)
{
XElement xElement;
if (!xmlDocument.HasElements)
{
xElement = new XElement(xmlDocument.Name.LocalName) { Value = xmlDocument.Value };
}
else
{
xElement = new XElement(xmlDocument.Name.LocalName, xmlDocument.Elements().Select(RemoveAllNamespaces));
}
foreach (var attribute in xmlDocument.Attributes())
{
if (!attribute.IsNamespaceDeclaration)
{
xElement.Add(attribute);
}
}
return xElement;
}
Here is my VB.NET version of Dexter Legaspi C# Version
Shared Function RemoveAllNamespaces(ByVal e As XElement) As XElement
Return New XElement(e.Name.LocalName, New Object() {(From n In e.Nodes Select If(TypeOf n Is XElement, RemoveAllNamespaces(TryCast(n, XElement)), n)), If(e.HasAttributes, (From a In e.Attributes Select a), Nothing)})
End Function
Another solution that takes into account possibly interleaving TEXT and ELEMENT nodes, e.g.:
<parent>
text1
<child1/>
text2
<child2/>
</parent>
Code:
using System.Linq;
namespace System.Xml.Linq
{
public static class XElementTransformExtensions
{
public static XElement WithoutNamespaces(this XElement source)
{
return new XElement(source.Name.LocalName,
source.Attributes().Select(WithoutNamespaces),
source.Nodes().Select(WithoutNamespaces)
);
}
public static XAttribute WithoutNamespaces(this XAttribute source)
{
return !source.IsNamespaceDeclaration
? new XAttribute(source.Name.LocalName, source.Value)
: default(XAttribute);
}
public static XNode WithoutNamespaces(this XNode source)
{
return
source is XElement
? WithoutNamespaces((XElement)source)
: source;
}
}
}
Without resorting to an XSLT-based solution, if you want clean, elegant and smart, you'd need some support from the framework, in particular, the visitor pattern could make this a breeze. Unfortunately, it's not available here.
I've implemented it inspired by LINQ's ExpressionVisitor to have a similar structure to it. With this, you can apply the visitor pattern to (LINQ-to-) XML objects. (I've done limited testing on this but it works well as far as I can tell)
public abstract class XObjectVisitor
{
public virtual XObject Visit(XObject node)
{
if (node != null)
return node.Accept(this);
return node;
}
public ReadOnlyCollection<XObject> Visit(IEnumerable<XObject> nodes)
{
return nodes.Select(node => Visit(node))
.Where(node => node != null)
.ToList()
.AsReadOnly();
}
public T VisitAndConvert<T>(T node) where T : XObject
{
if (node != null)
return Visit(node) as T;
return node;
}
public ReadOnlyCollection<T> VisitAndConvert<T>(IEnumerable<T> nodes) where T : XObject
{
return nodes.Select(node => VisitAndConvert(node))
.Where(node => node != null)
.ToList()
.AsReadOnly();
}
protected virtual XObject VisitAttribute(XAttribute node)
{
return node.Update(node.Name, node.Value);
}
protected virtual XObject VisitComment(XComment node)
{
return node.Update(node.Value);
}
protected virtual XObject VisitDocument(XDocument node)
{
return node.Update(
node.Declaration,
VisitAndConvert(node.Nodes())
);
}
protected virtual XObject VisitElement(XElement node)
{
return node.Update(
node.Name,
VisitAndConvert(node.Attributes()),
VisitAndConvert(node.Nodes())
);
}
protected virtual XObject VisitDocumentType(XDocumentType node)
{
return node.Update(
node.Name,
node.PublicId,
node.SystemId,
node.InternalSubset
);
}
protected virtual XObject VisitProcessingInstruction(XProcessingInstruction node)
{
return node.Update(
node.Target,
node.Data
);
}
protected virtual XObject VisitText(XText node)
{
return node.Update(node.Value);
}
protected virtual XObject VisitCData(XCData node)
{
return node.Update(node.Value);
}
#region Implementation details
internal InternalAccessor Accessor
{
get { return new InternalAccessor(this); }
}
internal class InternalAccessor
{
private XObjectVisitor visitor;
internal InternalAccessor(XObjectVisitor visitor) { this.visitor = visitor; }
internal XObject VisitAttribute(XAttribute node) { return visitor.VisitAttribute(node); }
internal XObject VisitComment(XComment node) { return visitor.VisitComment(node); }
internal XObject VisitDocument(XDocument node) { return visitor.VisitDocument(node); }
internal XObject VisitElement(XElement node) { return visitor.VisitElement(node); }
internal XObject VisitDocumentType(XDocumentType node) { return visitor.VisitDocumentType(node); }
internal XObject VisitProcessingInstruction(XProcessingInstruction node) { return visitor.VisitProcessingInstruction(node); }
internal XObject VisitText(XText node) { return visitor.VisitText(node); }
internal XObject VisitCData(XCData node) { return visitor.VisitCData(node); }
}
#endregion
}
public static class XObjectVisitorExtensions
{
#region XObject.Accept "instance" method
public static XObject Accept(this XObject node, XObjectVisitor visitor)
{
Validation.CheckNullReference(node);
Validation.CheckArgumentNull(visitor, "visitor");
// yay, easy dynamic dispatch
Acceptor acceptor = new Acceptor(node as dynamic);
return acceptor.Accept(visitor);
}
private class Acceptor
{
public Acceptor(XAttribute node) : this(v => v.Accessor.VisitAttribute(node)) { }
public Acceptor(XComment node) : this(v => v.Accessor.VisitComment(node)) { }
public Acceptor(XDocument node) : this(v => v.Accessor.VisitDocument(node)) { }
public Acceptor(XElement node) : this(v => v.Accessor.VisitElement(node)) { }
public Acceptor(XDocumentType node) : this(v => v.Accessor.VisitDocumentType(node)) { }
public Acceptor(XProcessingInstruction node) : this(v => v.Accessor.VisitProcessingInstruction(node)) { }
public Acceptor(XText node) : this(v => v.Accessor.VisitText(node)) { }
public Acceptor(XCData node) : this(v => v.Accessor.VisitCData(node)) { }
private Func<XObjectVisitor, XObject> accept;
private Acceptor(Func<XObjectVisitor, XObject> accept) { this.accept = accept; }
public XObject Accept(XObjectVisitor visitor) { return accept(visitor); }
}
#endregion
#region XObject.Update "instance" method
public static XObject Update(this XAttribute node, XName name, string value)
{
Validation.CheckNullReference(node);
Validation.CheckArgumentNull(name, "name");
Validation.CheckArgumentNull(value, "value");
return new XAttribute(name, value);
}
public static XObject Update(this XComment node, string value = null)
{
Validation.CheckNullReference(node);
return new XComment(value);
}
public static XObject Update(this XDocument node, XDeclaration declaration = null, params object[] content)
{
Validation.CheckNullReference(node);
return new XDocument(declaration, content);
}
public static XObject Update(this XElement node, XName name, params object[] content)
{
Validation.CheckNullReference(node);
Validation.CheckArgumentNull(name, "name");
return new XElement(name, content);
}
public static XObject Update(this XDocumentType node, string name, string publicId = null, string systemId = null, string internalSubset = null)
{
Validation.CheckNullReference(node);
Validation.CheckArgumentNull(name, "name");
return new XDocumentType(name, publicId, systemId, internalSubset);
}
public static XObject Update(this XProcessingInstruction node, string target, string data)
{
Validation.CheckNullReference(node);
Validation.CheckArgumentNull(target, "target");
Validation.CheckArgumentNull(data, "data");
return new XProcessingInstruction(target, data);
}
public static XObject Update(this XText node, string value = null)
{
Validation.CheckNullReference(node);
return new XText(value);
}
public static XObject Update(this XCData node, string value = null)
{
Validation.CheckNullReference(node);
return new XCData(value);
}
#endregion
}
public static class Validation
{
public static void CheckNullReference<T>(T obj) where T : class
{
if (obj == null)
throw new NullReferenceException();
}
public static void CheckArgumentNull<T>(T obj, string paramName) where T : class
{
if (obj == null)
throw new ArgumentNullException(paramName);
}
}
p.s., this particular implementation uses some .NET 4 features to make implementation a bit easier/cleaner (usage of dynamic and default arguments). It shouldn't be too dificult to make it .NET 3.5 compatible, perhaps even .NET 2.0 compatible.
Then to implement the visitor, here's a generalized one that can change multiple namespaces (and the prefix used).
public class ChangeNamespaceVisitor : XObjectVisitor
{
private INamespaceMappingManager manager;
public ChangeNamespaceVisitor(INamespaceMappingManager manager)
{
Validation.CheckArgumentNull(manager, "manager");
this.manager = manager;
}
protected INamespaceMappingManager Manager { get { return manager; } }
private XName ChangeNamespace(XName name)
{
var mapping = Manager.GetMapping(name.Namespace);
return mapping.ChangeNamespace(name);
}
private XObject ChangeNamespaceDeclaration(XAttribute node)
{
var mapping = Manager.GetMapping(node.Value);
return mapping.ChangeNamespaceDeclaration(node);
}
protected override XObject VisitAttribute(XAttribute node)
{
if (node.IsNamespaceDeclaration)
return ChangeNamespaceDeclaration(node);
return node.Update(ChangeNamespace(node.Name), node.Value);
}
protected override XObject VisitElement(XElement node)
{
return node.Update(
ChangeNamespace(node.Name),
VisitAndConvert(node.Attributes()),
VisitAndConvert(node.Nodes())
);
}
}
// and all the gory implementation details
public class NamespaceMappingManager : INamespaceMappingManager
{
private Dictionary<XNamespace, INamespaceMapping> namespaces = new Dictionary<XNamespace, INamespaceMapping>();
public NamespaceMappingManager Add(XNamespace fromNs, XNamespace toNs, string toPrefix = null)
{
var item = new NamespaceMapping(fromNs, toNs, toPrefix);
namespaces.Add(item.FromNs, item);
return this;
}
public INamespaceMapping GetMapping(XNamespace fromNs)
{
INamespaceMapping mapping;
if (!namespaces.TryGetValue(fromNs, out mapping))
mapping = new NullMapping();
return mapping;
}
private class NullMapping : INamespaceMapping
{
public XName ChangeNamespace(XName name)
{
return name;
}
public XObject ChangeNamespaceDeclaration(XAttribute node)
{
return node.Update(node.Name, node.Value);
}
}
private class NamespaceMapping : INamespaceMapping
{
private XNamespace fromNs;
private XNamespace toNs;
private string toPrefix;
public NamespaceMapping(XNamespace fromNs, XNamespace toNs, string toPrefix = null)
{
this.fromNs = fromNs ?? "";
this.toNs = toNs ?? "";
this.toPrefix = toPrefix;
}
public XNamespace FromNs { get { return fromNs; } }
public XNamespace ToNs { get { return toNs; } }
public string ToPrefix { get { return toPrefix; } }
public XName ChangeNamespace(XName name)
{
return name.Namespace == fromNs
? toNs + name.LocalName
: name;
}
public XObject ChangeNamespaceDeclaration(XAttribute node)
{
if (node.Value == fromNs.NamespaceName)
{
if (toNs == XNamespace.None)
return null;
var xmlns = !String.IsNullOrWhiteSpace(toPrefix)
? (XNamespace.Xmlns + toPrefix)
: node.Name;
return node.Update(xmlns, toNs.NamespaceName);
}
return node.Update(node.Name, node.Value);
}
}
}
public interface INamespaceMappingManager
{
INamespaceMapping GetMapping(XNamespace fromNs);
}
public interface INamespaceMapping
{
XName ChangeNamespace(XName name);
XObject ChangeNamespaceDeclaration(XAttribute node);
}
And a little helper method to get the ball rolling:
T ChangeNamespace<T>(T node, XNamespace fromNs, XNamespace toNs, string toPrefix = null) where T : XObject
{
return node.Accept(
new ChangeNamespaceVisitor(
new NamespaceMappingManager()
.Add(fromNs, toNs, toPrefix)
)
) as T;
}
Then to remove a namespace, you could call it like so:
var doc = ChangeNamespace(XDocument.Load(pathToXml),
fromNs: "http://schema.peters.com/doc_353/1/Types",
toNs: null);
Using this visitor, you can write a INamespaceMappingManager to remove all namespaces.
T RemoveAllNamespaces<T>(T node) where T : XObject
{
return node.Accept(
new ChangeNamespaceVisitor(new RemoveNamespaceMappingManager())
) as T;
}
public class RemoveNamespaceMappingManager : INamespaceMappingManager
{
public INamespaceMapping GetMapping(XNamespace fromNs)
{
return new RemoveNamespaceMapping();
}
private class RemoveNamespaceMapping : INamespaceMapping
{
public XName ChangeNamespace(XName name)
{
return name.LocalName;
}
public XObject ChangeNamespaceDeclaration(XAttribute node)
{
return null;
}
}
}
Simple solution that actually renames the elements in-place, not creating a copy, and does a pretty good job of replacing the attributes.
public void RemoveAllNamespaces(ref XElement value)
{
List<XAttribute> attributesToRemove = new List<XAttribute>();
foreach (void e_loopVariable in value.DescendantsAndSelf) {
e = e_loopVariable;
if (e.Name.Namespace != XNamespace.None) {
e.Name = e.Name.LocalName;
}
foreach (void a_loopVariable in e.Attributes) {
a = a_loopVariable;
if (a.IsNamespaceDeclaration) {
//do not keep it at all
attributesToRemove.Add(a);
} else if (a.Name.Namespace != XNamespace.None) {
e.SetAttributeValue(a.Name.LocalName, a.Value);
attributesToRemove.Add(a);
}
}
}
foreach (void a_loopVariable in attributesToRemove) {
a = a_loopVariable;
a.Remove();
}
}
Note: this does not always preserve original attribute order, but I'm sure you could change it to do that pretty easily if it's important to you.
Also note that this also could throw an exception, if you had an XElement attributes that are only unique with the namespace, like:
<root xmlns:ns1="a" xmlns:ns2="b">
<elem ns1:dupAttrib="" ns2:dupAttrib="" />
</root>
which really seems like an inherent problem. But since the question indicated outputing a String, not an XElement, in this case you could have a solution that would output a valid String that was an invalid XElement.
I also liked jocull's answer using a custom XmlWriter, but when I tried it, it did not work for me. Although it all looks correct, I couldn't tell if the XmlNoNamespaceWriter class had any effect at all; it definitely was not removing the namespaces as I wanted it to.
Adding my that also cleans out the name of nodes that have namespace prefixes:
public static string RemoveAllNamespaces(XElement element)
{
string tex = element.ToString();
var nsitems = element.DescendantsAndSelf().Select(n => n.ToString().Split(' ', '>')[0].Split('<')[1]).Where(n => n.Contains(":")).DistinctBy(n => n).ToArray();
//Namespace prefix on nodes: <a:nodename/>
tex = nsitems.Aggregate(tex, (current, nsnode) => current.Replace("<"+nsnode + "", "<" + nsnode.Split(':')[1] + ""));
tex = nsitems.Aggregate(tex, (current, nsnode) => current.Replace("</" + nsnode + "", "</" + nsnode.Split(':')[1] + ""));
//Namespace attribs
var items = element.DescendantsAndSelf().SelectMany(d => d.Attributes().Where(a => a.IsNamespaceDeclaration || a.ToString().Contains(":"))).DistinctBy(o => o.Value);
tex = items.Aggregate(tex, (current, xAttribute) => current.Replace(xAttribute.ToString(), ""));
return tex;
}
I tried the first few solutions and didn't work for me. Mainly the problem with attributes being removed like the other have already mentioned. I would say my approach is very similar to Jimmy by using the XElement constructors that taking object as parameters.
public static XElement RemoveAllNamespaces(this XElement element)
{
return new XElement(element.Name.LocalName,
element.HasAttributes ? element.Attributes().Select(a => new XAttribute(a.Name.LocalName, a.Value)) : null,
element.HasElements ? element.Elements().Select(e => RemoveAllNamespaces(e)) : null,
element.Value);
}
my answer, string-manipulation-based,
lite-most code,
public static string hilangkanNamespace(string instrXML)
{
char chrOpeningTag = '<';
char chrClosingTag = '>';
char chrSpasi = ' ';
int intStartIndex = 0;
do
{
int intIndexKu = instrXML.IndexOf(chrOpeningTag, intStartIndex);
if (intIndexKu < 0)
break; //kalau dah ga ketemu keluar
int intStart = instrXML.IndexOfAny(new char[] { chrSpasi, chrClosingTag }, intIndexKu + 1); //mana yang ketemu duluan
if (intStart < 0)
break; //kalau dah ga ketemu keluar
int intStop = instrXML.IndexOf(chrClosingTag, intStart);
if (intStop < 0)
break; //kalau dah ga ketemu keluar
else
intStop--; //exclude si closingTag
int intLengthToStrip = intStop - intStart + 1;
instrXML = instrXML.Remove(intStart, intLengthToStrip);
intStartIndex = intStart;
} while (true);
return instrXML;
}
This worked for me.
FileStream fs = new FileStream(filePath, FileMode.Open);
StreamReader sr = new StreamReader(fs);
DataSet ds = new DataSet();
ds.ReadXml(sr);
ds.Namespace = "";
string outXML = ds.GetXml();
ds.Dispose();
sr.Dispose();
fs.Dispose();
user892217's answer is almost correct. It won't compile as is, so needs a slight correction to the recursive call:
private static XElement RemoveAllNamespaces(XElement xmlDocument)
{
XElement xElement;
if (!xmlDocument.HasElements)
{
xElement = new XElement(xmlDocument.Name.LocalName) { Value = xmlDocument.Value };
}
else
{
xElement = new XElement(xmlDocument.Name.LocalName, xmlDocument.Elements().Select(x => RemoveAllNamespaces(x)));
}
foreach (var attribute in xmlDocument.Attributes())
{
if (!attribute.IsNamespaceDeclaration)
{
xElement.Add(attribute);
}
}
return xElement;
}
After much searching for a solution to this very issue, this particular page seemed to have the most beef...however, nothing quite fit exactly, so I took the old-fashioned way and just parsed the stuff out I wanted out. Hope this helps someone. (Note: this also removes the SOAP or similar envelope stuff.)
public static string RemoveNamespaces(string psXml)
{
//
// parse through the passed XML, and remove any and all namespace references...also
// removes soap envelope/header(s)/body, or any other references via ":" entities,
// leaving all data intact
//
string xsXml = "", xsCurrQtChr = "";
int xiPos = 0, xiLastPos = psXml.Length - 1;
bool xbInNode = false;
while (xiPos <= xiLastPos)
{
string xsCurrChr = psXml.Substring(xiPos, 1);
xiPos++;
if (xbInNode)
{
if (xsCurrChr == ":")
{
// soap envelope or body (or some such)
// we'll strip these node wrappers completely
// need to first strip the beginning of it off (i.e. "<soap" or "<s")
int xi = xsXml.Length;
string xsChr = "";
do
{
xi--;
xsChr = xsXml.Substring(xi, 1);
xsXml = xsXml.Substring(0, xi);
} while (xsChr != "<");
// next, find end of node
string xsQt = "";
do
{
xiPos++;
if (xiPos <= xiLastPos)
{
xsChr = psXml.Substring(xiPos, 1);
if (xsQt.Length == 0)
{
if (xsChr == "'" || xsChr == "\"")
{
xsQt = xsChr;
}
}
else
{
if (xsChr == xsQt)
{
xsQt = ""; // end of quote
}
else
{
if (xsChr == ">") xsChr = "x"; // stay in loop...this is not end of node
}
}
}
} while (xsChr != ">" && xiPos <= xiLastPos);
xiPos++; // skip over closing ">"
xbInNode = false;
}
else
{
if (xsCurrChr == ">")
{
xbInNode = false;
xsXml += xsCurrChr;
}
else
{
if (xsCurrChr == " " || xsCurrChr == "\t")
{
// potential namespace...let's check...next character must be "/"
// or more white space, and if not, skip until we find such
string xsChr = "";
int xiOrgLen = xsXml.Length;
xsXml += xsCurrChr;
do
{
if (xiPos <= xiLastPos)
{
xsChr = psXml.Substring(xiPos, 1);
xiPos++;
if (xsChr == " " || xsChr == "\r" || xsChr == "\n" || xsChr == "\t")
{
// carry on..white space
xsXml += xsChr;
}
else
{
if (xsChr == "/" || xsChr == ">")
{
xsXml += xsChr;
}
else
{
// namespace! - get rid of it
xsXml = xsXml.Substring(0, xiOrgLen - 0); // first, truncate any added whitespace
// next, peek forward until we find "/" or ">"
string xsQt = "";
do
{
if (xiPos <= xiLastPos)
{
xsChr = psXml.Substring(xiPos, 1);
xiPos++;
if (xsQt.Length > 0)
{
if (xsChr == xsQt) xsQt = ""; else xsChr = "x";
}
else
{
if (xsChr == "'" || xsChr == "\"") xsQt = xsChr;
}
}
} while (xsChr != ">" && xsChr != "/" && xiPos <= xiLastPos);
if (xsChr == ">" || xsChr == "/") xsXml += xsChr;
xbInNode = false;
}
}
}
} while (xsChr != ">" && xsChr != "/" && xiPos <= xiLastPos);
}
else
{
xsXml += xsCurrChr;
}
}
}
}
else
{
//
// if not currently inside a node, then we are in a value (or about to enter a new node)
//
xsXml += xsCurrChr;
if (xsCurrQtChr.Length == 0)
{
if (xsCurrChr == "<")
{
xbInNode = true;
}
}
else
{
//
// currently inside a quoted string
//
if (xsCurrQtChr == xsCurrChr)
{
// finishing quoted string
xsCurrQtChr = "";
}
}
}
}
return (xsXml);
}
Without recreating whole node hierarchy:
private static void RemoveDefNamespace(XElement element)
{
var defNamespase = element.Attribute("xmlns");
if (defNamespase != null)
defNamespase.Remove();
element.Name = element.Name.LocalName;
foreach (var child in element.Elements())
{
RemoveDefNamespace(child);
}
}
I tried some of the solutions, but as stated by so many, there are some edge cases.
Used some of the regexes above, but came to the conclusion that a one step regex is not feasable.
So here is my solution, 2 step regex, find tags, within tags remove, do not alter cdata:
Func<Match, String> NamespaceRemover = delegate (Match match)
{
var result = match.Value;
if (String.IsNullOrEmpty(match.Groups["cdata"].Value))
{
// find all prefixes within start-, end tag and attributes and also namespace declarations
return Regex.Replace(result, "((?<=<|<\\/| ))\\w+:| xmlns(:\\w+)?=\".*?\"", "");
}
else
{
// cdata as is
return result;
}
};
// XmlDocument doc;
// string file;
doc.LoadXml(
Regex.Replace(File.ReadAllText(file),
// find all begin, cdata and end tags (do not change order)
#"<(?:\w+:?\w+.*?|(?<cdata>!\[CDATA\[.*?\]\])|\/\w+:?\w+)>",
new MatchEvaluator(NamespaceRemover)
)
);
For now it is 100% working for me.
A vb .net version of Peter Stegnar's code.
Public Class XElementRemoveNameSpace
'
'based on
' Peter Stegnar's code here
' https://stackoverflow.com/questions/987135/how-to-remove-all-namespaces-from-xml-with-c
'
Public Shared Function RemoveAllNamespaces(inXML As XElement) As XElement
Dim Xel As XElement = _RemoveAllNamespaces(inXML)
Return Xel
End Function
Private Shared Function _RemoveAllNamespaces(inXML As XElement) As XElement
Dim Xel As XElement
If Not inXML.HasElements Then
Xel = New XElement(inXML.Name.LocalName)
Xel.Value = inXML.Value
SetAttrs(inXML, Xel)
Return Xel
Else
Xel = New XElement(inXML.Name.LocalName, inXML.Elements().Select(Function(el) _RemoveAllNamespaces(el)))
SetAttrs(inXML, Xel) 'in case there are attributes on this element
Return Xel
End If
End Function
Private Shared Sub SetAttrs(inXML As XElement, outXML As XElement)
If inXML.HasAttributes Then
Dim nm As String
Dim NamespaceName As String
For Each attribute As XAttribute In inXML.Attributes()
Try
nm = attribute.Name.LocalName
' NamespaceName = attribute.Name.NamespaceName 'for debug
outXML.SetAttributeValue(nm, attribute.Value) 'stop invOP exception
Catch invOP As InvalidOperationException
'todo dup attributes - should not happen
Catch ex As Exception
End Try
Next
End If
End Sub
End Class
Here is a regex based solution to this problem...
private XmlDocument RemoveNS(XmlDocument doc)
{
var xml = doc.OuterXml;
var newxml = Regex.Replace(xml, #"xmlns[:xsi|:xsd]*="".*?""","");
var newdoc = new XmlDocument();
newdoc.LoadXml(newxml);
return newdoc;
}

Traverse up a list using Linq

I have a class as follows:
public class SiloNode
{
public string Key { get; private set; }
public string ParentKey { get; private set; }
public string Text { get; private set; }
public string Url { get; private set; }
}
All nodes are contained in a list:
List<SiloNode> nodes = new List<SiloNode>();
As you can see, the class contains a ParentKey property, so it's possible to find the item's parent, grandparent etc, until the top of the list is hit.
At present, I need to traverse up 2 levels, and you can see from the code below, it's already looking quite clunky. Now I need to modify the code to traverse up 3 levels and I'm concerned it's getting messy.
Is there a cleaner way to achieve what I want?
string GetStartGroup(string currentUrl)
{
string startGroup = null;
var currentNode = Silos.Silo.SingleOrDefault(x => x.Url == currentUrl);
if (currentNode != null)
{
var parentNode = Silos.Silo.SingleOrDefault(x => x.Key == currentNode.ParentKey);
if (parentNode != null) startGroup = parentNode.ParentKey;
}
return startGroup;
}
Repeatedly using SingleOrDefault on a list makes the algorithm rather slow: finding parents for n nodes requires an O(n2) time.
You should make a Dictionary<string,SiloNode> first, and then traverse up the hierarchy through the dictionary:
var lookup= nodes.ToDictionary(n => n.Key);
...
SiloNode FindParent(SiloNode node, int levelsUp, IDictionary<string,SiloNode> lookup) {
while (node != null && levelsUp != 0) {
if (node.ParentKey == null || !lookup.TryGetValue(node.ParentKey, out var parent)) {
return node;
}
node = parent;
levelsUp--;
}
return node;
}
This will look up a parent up to levelsUp levels up. If you are looking for the last possible parent, modify the code as follows:
SiloNode FindParent(SiloNode node, IDictionary<string,SiloNode> lookup) {
while (true) {
if (node?.ParentKey == null || !lookup.TryGetValue(node.ParentKey, out var parent)) {
return node;
}
node = parent;
}
}
or recursively
SiloNode FindParent(SiloNode node, IDictionary<string,SiloNode> lookup) {
return node?.ParentKey != null && lookup.TryGetValue(node.ParentKey, out var parent)
? FindParent(parent, lookup)
: node;
}
You can do this with recursion.
string GetStartGroup(string currentUrl)
{
var node = nodes.Single(x => x.Url == currentUrl);
if (node.ParentKey == null)
return node.Key;
return GetStartGroup(nodes.Single(x => x.Key == node.ParentKey).Url);
}
Alternatively:
string GetStartGroup(string currentUrl)
{
return GetStartNode(nodes.Single(x => x.Url == currentUrl)).Key;
}
SiloNode GetStartNode(SiloNode node)
{
if (node.ParentKey == null)
return node;
return GetStartNode(nodes.Single(x => x.Key == node.ParentKey));
}
You can change
if (parentNode != null) startGroup = parentNode.ParentKey;
to
if (parentNode != null) startGroup = GetStartGroup(currentNode.parentUrl /*or something similar*/);
However, it would be better to use an iterative loop. I do not know about your problem enough to give you a hint, but the pseudocode would look like this:
while (parentNode != null) {
currentNode = currentNode.parentNode;
parentNode = currentNode.parentNode;
}
You might need to call SingleOrDefault, but if you have a direct reference, you should use that instead.
just put it into a method:
public static Silo Up(Silo current, IEnumerable<Silo> collection)
{
return collection.FirstOrDefault((it) => it.ParentKey == it.Key);
}
or as Extension method:
public static SiloExtensions
{
public static Silo Up(this Silo current, IEnumerable<Silo> collection)
{
return collection.FirstOrDefault((it) => it.ParentKey == it.Key);
}
}
so you can just do silo.Up()?.Up()
Please note that this is rather slow.
Depending on what you actually do, you may want to introduce actual Parent-Object as a field or a wrapper object providing access to it.
Such a wrapper object might look like this:
public class SiloWrapper
{
public Silo Wrapped { get; }
public Silo Parent { get; }
private SiloWrapper(Silo silo, Silo parent)
{
this.Wrapped = silo;
this.Parent = parent;
}
public IEnumerable<SiloWrapper> Map(IEnumerable<Silo> silos)
{
var dict = silos.ToDictionary((s) => s.Key);
foreach(var s in silos)
{
yield return new SiloWrapped(s, s.ParentKey == null ? null : dict[s.ParentKey]);
}
}
}
to then traverse up and down, you just would need to call SiloWrapped.Map(<methodToGetSiloCollection>) and have all wrapped silos ready for usage.
If GarbageCollection may be a concern, you also can use WeakReference<Silo> ParentWeak instead

How to iterate through XML file

How do I iterate over all the elements of the XML tree, and access to them? What I mean is that the input gets unknown xml file, and the code itself it iterates not knowing the number of elements and branches.
I know that was a lot of answers on such topics and the solution was ready to implement the code, where the known structures xml.
This is a code which i am using:
class Program
{
static void Process(XElement element)
{
if (!element.HasElements)
{
Console.WriteLine(element.GetAbsoluteXPath());
}
else
{
foreach (XElement child in element.Elements())
{
Process(child);
}
}
}}
static void Main()
{
var doc = XDocument.Load("C:\\Users\\Błażej\\Desktop\\authors1.xml");
List<string> elements = new List<string>();
Program.Process(doc.Root);
Console.ReadKey();
}
public static class XExtensions
{
public static string GetAbsoluteXPath(this XElement element)
{
if (element == null)
{
throw new ArgumentNullException("element");
}
Func<XElement, string> relativeXPath = e =>
{
int index = e.IndexPosition();
string name = e.Name.LocalName;
return (index == -1) ? "/" + name : string.Format
(
"/{0}[{1}]",
name,
index.ToString()
);
};
var ancestors = from e in element.Ancestors()
select relativeXPath(e);
return string.Concat(ancestors.Reverse().ToArray()) +
relativeXPath(element);
}
public static int IndexPosition(this XElement element)
{
if (element == null)
{
throw new ArgumentNullException("element");
}
if (element.Parent == null)
{
return -1;
}
int i = 1;
foreach (var sibling in element.Parent.Elements(element.Name))
{ // czyli patrzymy na rodzeństwo, czy występują podobne np. au_id[1], au_id[2]
if (sibling == element)
{
return i;
}
else
{
i++;
}
}
throw new InvalidOperationException
("element has been removed from its parent.");
}
Now how can i add some elements to exisitng nodes by iterate ? It's possible ?
As #strongbutgood mentioned, this is very generic.
May be below generic code helps
Dim xdoc As New XDocument();
xdoc = XDocument.Parse(myXML);
// Adding new element
xdoc.Element(<Your_Parent_Node_Name>).Add(new XElement("<New_Element_Name>", "<New_Element_Value>"));
// Deleting existing element
xdoc.Descendants().Where(s =>s.Value == "<Element_To_Be_Removed>").Remove();
Hope this helps.
Please note that this is a generic answer and you will need to re-write as per your needs

HtmlAgilityPack: xpath and regex

I'm currently using HtmlAgilityPack to search for certain content via an xpath query. Something like this:
var col = doc.DocumentNode.SelectNodes("//*[text()[contains(., 'foo'] or #*....
Now I want to search for specific content in all of the html sourcecode (= text, tags and attributes) using a regular expression. How can this be achived with HtmlAgilityPack? Can HtmlAgilityPack handle xpath+regex or what would be the best way of using a regex and HtmlAgilityPack to search?
The Html Agility Pack uses the underlying .NET XPATH implementation for its XPATH support. Fortunately XPATH in .NET is fully extensible (BTW: it's a shame Microsoft doesn't invest any more in this superb technology...).
So, let's suppose I have this html:
<div>hello</div>
<div>hallo</div>
Here is a sample code that will select both node because it compares the nodes with the 'h.llo' regex expression:
HtmlNodeNavigator nav = new HtmlNodeNavigator("mypage.htm");
foreach (var node in SelectNodes(nav, "//div[regex-is-match(text(), 'h.llo')]"))
{
Console.WriteLine(node.OuterHtml); // should dump both div elements
}
It works because I use a special Xslt/XPath context where I have defined a new XPATH function called "regex-is-match". Here is the SelectNodes utility code:
public static IEnumerable<HtmlNode> SelectNodes(HtmlNodeNavigator navigator, string xpath)
{
if (navigator == null)
throw new ArgumentNullException("navigator");
XPathExpression expr = navigator.Compile(xpath);
expr.SetContext(new HtmlXsltContext());
object eval = navigator.Evaluate(expr);
XPathNodeIterator it = eval as XPathNodeIterator;
if (it != null)
{
while (it.MoveNext())
{
HtmlNodeNavigator n = it.Current as HtmlNodeNavigator;
if (n != null && n.CurrentNode != null)
{
yield return n.CurrentNode;
}
}
}
}
And here is the support code:
public class HtmlXsltContext : XsltContext
{
public HtmlXsltContext()
: base(new NameTable())
{
}
public override int CompareDocument(string baseUri, string nextbaseUri)
{
throw new NotImplementedException();
}
public override bool PreserveWhitespace(XPathNavigator node)
{
throw new NotImplementedException();
}
protected virtual IXsltContextFunction CreateHtmlXsltFunction(string prefix, string name, XPathResultType[] ArgTypes)
{
return HtmlXsltFunction.GetBuiltIn(this, prefix, name, ArgTypes);
}
public override IXsltContextFunction ResolveFunction(string prefix, string name, XPathResultType[] ArgTypes)
{
return CreateHtmlXsltFunction(prefix, name, ArgTypes);
}
public override IXsltContextVariable ResolveVariable(string prefix, string name)
{
throw new NotImplementedException();
}
public override bool Whitespace
{
get { return true; }
}
}
public abstract class HtmlXsltFunction : IXsltContextFunction
{
protected HtmlXsltFunction(HtmlXsltContext context, string prefix, string name, XPathResultType[] argTypes)
{
Context = context;
Prefix = prefix;
Name = name;
ArgTypes = argTypes;
}
public HtmlXsltContext Context { get; private set; }
public string Prefix { get; private set; }
public string Name { get; private set; }
public XPathResultType[] ArgTypes { get; private set; }
public virtual int Maxargs
{
get { return Minargs; }
}
public virtual int Minargs
{
get { return 1; }
}
public virtual XPathResultType ReturnType
{
get { return XPathResultType.String; }
}
public abstract object Invoke(XsltContext xsltContext, object[] args, XPathNavigator docContext);
public static IXsltContextFunction GetBuiltIn(HtmlXsltContext context, string prefix, string name, XPathResultType[] argTypes)
{
if (name == "regex-is-match")
return new RegexIsMatch(context, name);
// TODO: create other functions here
return null;
}
public static string ConvertToString(object argument, bool outer, string separator)
{
if (argument == null)
return null;
string s = argument as string;
if (s != null)
return s;
XPathNodeIterator it = argument as XPathNodeIterator;
if (it != null)
{
if (!it.MoveNext())
return null;
StringBuilder sb = new StringBuilder();
do
{
HtmlNodeNavigator n = it.Current as HtmlNodeNavigator;
if (n != null && n.CurrentNode != null)
{
if (sb.Length > 0 && separator != null)
{
sb.Append(separator);
}
sb.Append(outer ? n.CurrentNode.OuterHtml : n.CurrentNode.InnerHtml);
}
}
while (it.MoveNext());
return sb.ToString();
}
IEnumerable enumerable = argument as IEnumerable;
if (enumerable != null)
{
StringBuilder sb = null;
foreach (object arg in enumerable)
{
if (sb == null)
{
sb = new StringBuilder();
}
if (sb.Length > 0 && separator != null)
{
sb.Append(separator);
}
string s2 = ConvertToString(arg, outer, separator);
if (s2 != null)
{
sb.Append(s2);
}
}
return sb != null ? sb.ToString() : null;
}
return string.Format("{0}", argument);
}
public class RegexIsMatch : HtmlXsltFunction
{
public RegexIsMatch(HtmlXsltContext context, string name)
: base(context, null, name, null)
{
}
public override XPathResultType ReturnType { get { return XPathResultType.Boolean; } }
public override int Minargs { get { return 2; } }
public override object Invoke(XsltContext xsltContext, object[] args, XPathNavigator docContext)
{
if (args.Length < 2)
return false;
return Regex.IsMatch(ConvertToString(args[0], false, null), ConvertToString(args[1], false, null));
}
}
}
The regex function is implemented in a class called RegexIsMatch at the end. It's not super complicated. Note there is a utility function ConvertToString that tries to coerce any xpath "thing" into a string that's very useful.
Of course, with this technology, you can define whatever XPATH function you need with very little code (I use this all the time to do upper/lower case conversions...).
Directly quoting,
I think the flaw here is that HTML is a Chomsky Type 2 grammar
(context free grammar) and RegEx is a Chomsky Type 3 grammar
(regular grammar). Since a Type 2 grammar is fundamentally more
complex than a Type 3 grammar (see the Chomsky hierarchy), you
can't possibly make this work. But many will try, some will claim
success and others will find the fault and totally mess you up.
It might make sense to use a regular expression with some parts of an HTML document. Trying to use HtmlAgilityPack to run a regular expression on the tags and structure of an HTML document is perverse and ultimately, cannot provide a universal solution to your problem.

Categories