How to hoist XML namespaces to root element - c#

If I have an XML file with namespaces like:
<root>
<h:table xmlns:h="http://www.namespaces.com/namespaceOne">
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<h:table xmlns:h="https://www.namespaces.com/namespaceTwo">
<h:name>African Coffee Table</h:name>
<h:width>80</h:width>
<h:length>120</h:length>
</h:table>
</root>
I want to hoist all of the namespaces to the root element, like this:
<root xmlns:h="http://www.namespaces.com/namespaceOne" xmlns:h1="https://www.namespaces.com/namespaceTwo">
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<h1:table>
<h1:name>African Coffee Table</h1:name>
<h1:width>80</h1:width>
<h1:length>120</h1:length>
</h1:table>
</root>
Is there a way to do this? Ideally automatically resolving conflicting namespace prefixes, as in the example above. I haven't committed to using Linq to XML or System.Xml yet, so either would be a possibility.
There is one major constraint: because of the environment I am working in, I can't write classes. I can write functions, but no new class definitions.

Turns out this is pretty straightforward:
var doc = XDocument.Parse(xml);
var namespaceAttributes = doc.Descendants()
.SelectMany(x => x.Attributes())
.Where(x => x.IsNamespaceDeclaration);
int count = 1;
foreach (var namespaceAttribute in namespaceAttributes)
{
doc.Root.Add(new XAttribute(XNamespace.Xmlns + $"h{count}", namespaceAttribute.Value));
namespaceAttribute.Remove();
count++;
}
We loop through all namespace declarations (xmlns:foo="foo"). For each one we find, we put a namespace attribute with the same URL on the root element, and remove that one.
Demo.
Note that this does slightly odd things if you have multiple namespaces with the same URL (e.g. if you have two lots of xmlns:h="https://www.namespaces.com/namespaceOne" on different children): it puts multiple xmlns declarations on the root element with the same URL, but all elements use the last such namespace. If you want to avoid that, just keep a list of namespaces you've added to the root element.

Related

Adding a root element to XML

I have an XML string which has the following structure:
<Element>
<Property1>Something</Propert1>
<Property2>SomethingElse</Property2>
</Element>
<Element>
<Property1>Something2</Propert1>
<Property2>SomethingElse2</Property2>
</Element>
I would like to serialize this to a List<Element>.
I use this code:
XmlSerializer xd = new XmlSerializer(typeof(T));
XDocument xdoc = XDocument.Parse(xmlStringToDesirialize);
T deserializedObject = xd.Deserialize(xdoc.CreateReader()) as T;
Where T is List<Element>. I get an exception saying There are multiple root elements. I understand why this is, but I`m not sure what to do about it.
I was thinking that adding a psudo-root element, like <Elements> might be a good solution, but I don't know how I would go about adding it to the XML document I already have.
Or maybe there is an alternative solution altogether.
EDIT: For completeness I am adding code for the full solution I needed for deserialization, in case anyone needs it.
I created a class:
[XmlRoot("myRoot", Namespace = "")]
public class MyRoot
{
[XmlElement("Element", Namespace = "{The xmlns of the actual class}")]
public List<Element> Elements {get; set;}
public MyRoot()
{
Elements = new List<Element>();
}
}
Then I deserialize to this class after adding the tags as suggested by #Richard.
Hope this can help someone.
I have an XML string
No you don't. You have something that can be refered to as an "XML Fragment". Specifically because there is not a single root element it is not an XML document. (There is no such thing as "Invalid XML": it is either valid or it is not XML).
XML Parsers require an XML document. But XmlSeriailsie is not just a parse: it includes an XML parser (of course) but also wants to generate an object graph from the content of the XML document making lots of assumptions about type availability and restrictions on the XML to match those types
The easiest approach with normal XML parsers would be to add a root element yourself. eg.:
var xdoc = XDocument.Parse("<myRoot>" + theString + "</myRoot>");
however for XML Deserialisation you will need to modify your available types to include a container that serialises with a myRoot element and then contains the relevant information.
However given the sample XML I see no sign of that looking like an object graph. Why not work with the parsed XML and extract the content using the parser's API?
XML must have 1! root element. So as you said before making <elements> as root node is solution.
And XML would look like:
<Elements>
<Element>
<Property1>Something</Propert1>
<Property2>SomethingElse</Property2>
</Element>
<Element>
<Property1>Something2</Propert1>
<Property2>SomethingElse2</Property2>
</Element>
</Elements>
why not create an element then append yours to it
XElement root = new XElement("root");
then append your elements to it

How to find an XPath query to Element/Element without namespaces (XmlSerializer, fragment)?

Assume this simple XML fragment in which there may or may not be the xml declaration and has exactly one NodeElement as a root node, followed by exactly one other NodeElement, which may contain an assortment of various number of different kinds of elements.
<?xml version="1.0">
<NodeElement xmlns="xyz">
<NodeElement xmlns="">
<SomeElement></SomeElement>
</NodeElement>
</NodeElement>
How could I go about selecting the inner NodeElement and its contents without the namespace? For instance, "//*[local-name()='NodeElement/NodeElement[1]']" (and other variations I've tried) doesn't seem to yield results.
As for in general the thing that I'm really trying to accomplish is to Deserialize a fragment of a larger XML document contained in a XmlDocument. Something like the following
var doc = new XmlDocument();
doc.LoadXml(File.ReadAllText(#"trickynodefile.xml")); //ReadAllText to avoid Unicode trouble.
var n = doc.SelectSingleNode("//*[local-name()='NodeElement/NodeElement[1]']");
using(var reader = XmlReader.Create(new StringReader(n.OuterXml)))
{
var obj = new XmlSerializer(typeof(NodeElementNodeElement)).Deserialize(reader);
I believe I'm missing just the right XPath expression, which seem to be rather elusive. Any help much appreciated!
Try this:
/*/*
It selects children of the root node.
Or
/*/*[local-name() = 'NodeElement']
It selects children with local-name() = 'NodeElement' of the root node.
Anyway in your case both expressions select <NodeElement xmlns="">.
walk the tree
foreach(XmlNode node in doc.DocumentElement.childnodes[0].childnodes)
{
// do something with node
}
hideously fragile of course might want to check for nulls here and there.

XmlElement.SelectNodes(..) - finds nothing.. Help?

Sorry to bother you with such a simple question, but I'm stuck here since an hour:
I have an xml file that looks something like this:
<?xml version="1.0" encoding="utf-8"?>
<aaa xmlns="http://blabla.com/xmlschema/v1">
<bbb>
<ccc>Foo</ccc>
</bbb>
<ddd x="y" />
<ddd x="xx" />
<ddd x="z" />
</aaa>
I'm trying to access the elements 'ddd' like this:
var doc = new XmlDocument();
doc.Load("example.xml");
foreach (XmlNode dddNode in doc.DocumentElement.SelectNodes("//ddd"))
{
// do something
Console.WriteLine(dddNode.Attributes["x"].Value);
}
At runtime the foreach loop is skipped because I don't get any nodes back from the .SelectNodes method. I breaked before the loop and had a look at the InnerXML, that looks fine, and I also tried all sorts of XPaths (like "//bbb" or "/aaa/ddd"), but only "/" seems to not return null.
This exact code worked for me before, now it does not. I suspect something with that namespace declaration in the aaa tag, but I couldn't figure out why this should cause problems. Or.. can you see anything I may be missing?
This is xml-namespaces. There is no ddd. There is, however, x:ddd where x is your alias to http://blabla.com/xmlschema/v1. You'll need to test with namespaces - for example:
var nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("x", "http://blabla.com/xmlschema/v1");
var nodes = doc.DocumentElement.SelectNodes("//x:ddd", nsmgr);
// nodes has 3 nodes
Note x in the above is arbitrary; there is no significance in x other than convenience.
This (rather inconveniently) means adding the namespace (or an alias, as above) into all of your xpath expressions.

How to deal with namespaces in XML in XmlDocument c#

I have several XML documents, all of which have the same structure (element names, attribute names and hierarchy).
However, some of the elements and attribute have custom namespaces in each XML document which are not known at design time. They change, don't ask...
How can I deal with this when traversing the documents using a single set of XPath?
Should I remove all the namespaces before processing?
Can I automatically register all namespaces with an XmlNamespaceManager?
Any thoughts?
Update: some examples (with namespace declarations omitted for clarity):
<root>
<child attr="val" />
</root>
<root>
<x:child attr="val" />
</root>
<root>
<y:child z:attr="val" />
</root>
Thanks
Suppose you have following xml:
<root xmlns="first">
<el1 xmlns="second">
<el2 xmlns="third">...
You can write you queries to ignore namespaces in the following way:
/*[local-name()='root']/*[local-name()='el1']/*[local-name()='el2']
etc.
Of course you can iterate over the whole document to get namespaces and load them into nsmanager. But in general case this will cause you to evaluate every node in the document. In this case it will be faster to just treat document as a tree of objects and don't use XPath.
I believe you'll find some good insight in this Stackoverflow thread
XPath + Namespace Driving me crazy
In my opinion you have either of two solutions:
1- If the set of all possible namespaces are know before hand, then you can register them all in a XmlNamespaceManager before you begin parsing
2- Use Xpath namespace-agnostic selectors
Of course you can always scrub the xml document from any inline namespaces and start your parsing on a clean unfiorm xml without namespace.. but honestly I don't see the gain in adding this overhead step.
Scott Hanselman has a nice article about extracting all of the XML Namespaces in an XML document. Presumably, when you get all of the XML Namespaces, you can just iterate over all of them and register them in your namespace manager.
You could try something like this to strip the namespaces:
//Implemented based on interface, not part of algorithm
public string RemoveAllNamespaces(string xmlDocument)
{
return RemoveAllNamespaces(XElement.Parse(xmlDocument)).ToString();
}
//Core recursion function
private XElement RemoveAllNamespaces(XElement xmlDocument)
{
if (!xmlDocument.HasElements)
{
XElement xElement = new XElement(xmlDocument.Name.LocalName);
xElement.Value = xmlDocument.Value;
return xElement;
}
return new XElement(xmlDocument.Name.LocalName, xmlDocument.Elements().Select(el => RemoveAllNamespaces(el)));
}
See Peter Stegnar's answer here for more details:
How to remove all namespaces from XML with C#?
You can also use direct node tests with wildcards, which will match any namespace (or lack thereof):
$your-document/*:root/*:child/#*:attr

Controlling the order of XML namepaces

I'm having a problem getting the "xmlns" to appear first in the root attribute list.
Im getting this:
<myroot
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.someurl.com/ns/myroot http://www.someurl.com/xml/schemas/myschema.xsd"
xmlns="http://www.someurl.com/ns/myroot">
<sometag>somecontent</sometag>
</myroot>
And i want this:
<myroot
xmlns="http://www.someurl.com/ns/myroot"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.someurl.com/ns/myroot http://www.someurl.com/xml/schemas/myschema.xsd">
<sometag>somecontent</sometag>
</myroot>
My code looks like this:
XNamespace rt = "http://www.someurl.com/ns/myroot";
XNamespace xsi = "http://www.w3.org/2001/XMLSchema-instance";
var submissionNode = new XElement(XmlNameSpaces.rt + "myroot");
submissionNode.Add(new XAttribute(XNamespace.Xmlns + "xsi", "http://www.w3.org/2001/XMLSchema-instance"));
submissionNode.Add(new XAttribute(xsi + "schemaLocation", #"http://www.someurl.com/ns/myroot http://www.someurl.com/xml/schemas/myschema.xsd"););
What do i need to do different to change the order?
EDIT: I understand the order is not normally relavent, but its a requirement in this case.
IIRC, the order of attributes (in xml) is unimportant... so why change it? Is it causing an actual problem?
Would XmlWriter be an option for you?
Afaik, it gives you full control of the order of attributes and namespace declarations.
Attribute ordering is NOT specified in the XML document, and shouldn't be relied upon. It may be worth looking at the spec
You'll find that if you read a XML document into a DOM, and write it out, regardless of the platform/library, you can't (and shouldn't) rely on the attribute ordering. It's a common misconception, btw!
I have a customer with this very problem. This was a real pain in the s, so I wrote a workaround to solve this.
Please note this is not a beautiful solution, and this should be not encouraged, but works.
public static class MyKludgeXmlClass
{
public static XmlDocument CreateXmlDocumentWithOrderedNamespaces()
{
var xml = "<?xml version=\"1.0\" encoding=\"utf-8\"?><MyRoot xmlns=\"http://www.example.com/schemas/1.0/VRSync\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://www.example.com/schemas/1.0/VRSync http://xml.example.com/vrsync.xsd\"></MyRoot>";
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.LoadXml(xml);
return doc;
}
}
With XmlDocument you can retrieve the root:
var xmlDoc = MyKludgeXmlClass.CreateXmlDocumentWithOrderedNamespaces();
XmlElement root = xmlDoc.DocumentElement;
And append children nodes using your favorite method.
Software that requires attributes to be in a specified order doesn't conform to the XML recommendation.
The first question you should be asking is not, "How can I produce XML with namespace attributes in a defined order?" Instead, it should be, "What are the other respects in which this software doesn't conform to the XML recommendation?" Because I will bet you one crisp new American dollar that if the recipient's process violates the XML recommendation in one respect, it violates it in at least one other.
Because sometimes the right answer is to say, no, don't do that...
Per W3C Namespaces in XML Recommendation, section 3 Declaring Namespaces:
[Definition: A namespace (or more precisely, a namespace binding) is
declared using a family of reserved attributes. Such an attribute's name must either be xmlns or begin xmlns:. These
attributes, like any other XML attributes, may be provided directly or
by default. ]
Therefore, the order of namespace declarations, like the order of any attributes, is insignificant.
So, no conformant XML tool or library will care about the order of namespace declarations, and neither should you.

Categories