How do I read/parse an XML document where the XML namespace alias is unknown?
The structure and namespaces of the XML document are known, but the alias is not. E.g.
<?xml version="1.0" encoding="utf-8"?>
<Order xmlns:aa="urn:namespace1"
xmlns:bb="urn:namespace2">
<aa:Quantity>1</aa:Quantity>
<bb:Price>9.98</bb:Price>
</Order>
Or
<?xml version="1.0" encoding="utf-8"?>
<Order xmlns:cc="urn:namespace1"
xmlns:dd="urn:namespace2">
<cc:Quantity>1</cc:Quantity>
<dd:Price>9.98</dd:Price>
</Order>
Update: I don't know the XML namespace aliases up front. They can be whatever.
I need to supply the XmlNamespaceManager with a list of namespaces and alias with the AddNameSpace method like so:
XPathDocument xDoc = new XPathDocument(“Path to my file”);
XPathNavigator xNav = xDoc.CreateNavigator();
XmlNamespaceManager xmlns = new XmlNamespaceManager(xNav.NameTable);
xmlns.AddNamespace("aa", "urn:namespace1");
xmlns.AddNamespace("bb", "urn:namespace2");
But this is not XML namespace agnostics. My second document uses cc and dd as alias for the same namespace.
The code you have provided is namespace agnostic in the sense that the namespace prefixes used in the source XML does not matter. Given the namespace definitions in your question you have to use the prefixes defined by you in the XPATH, e.g. you have to use aa and bb.
var quantity = xNav.SelectSingleNode("/Order/aa:Quantity", xmlns);
However, this code will still successfully select from the XML where prefixes cc and dd are used as long as the namespaces urn:namespace1 and urn:namespace2 are correctly used.
To be able to include namespace prefixes in the XPATH you have to use the overloads that accepts an IXmlNamespaceResolver.
To reiterate: When you define a namespace using the following code
xmlns.AddNamespace("aa", "urn:namespace1");
You state that in your code (e.g. in the XPATH you intend to use) you will be using namespace prefix aa for namespace urn:namespace1.
In the XML you want to parse you assign namespaces using an attribute:
xmlns:cc="urn:namespace1"
It is important that the string urn:namespace1 matches both places to use that particular namespace. The prefixes are local to your code and the XML file respectively and they do not have to match.
The namespace aliases used in the document don't matter - they are just the aliases that are used in the document and can be whatever the author of that document wanted to use when authoring that document (they can even change mid-document).
To access this document in a alias-agnostic way just provide whatever alias you want to use to the XmlNamespaceManager and then use that alias to access the document, for example
XmlNamespaceManager xmlns = new XmlNamespaceManager(xNav.NameTable);
xmlns.AddNamespace("foo", "urn:namespace1");
xmlns.AddNamespace("bar", "urn:namespace2");
These aliases don't need to match the ones used in the document - this then allows you to use XPath expressions using the foo and bar aliases for those namespaces to navigate the document regardless of the aliases used in the document itself (as long as you supply that instance of XmlNamespaceManager).
Related
Using C# XmlDocument object to edit an xml document which includes namespace declarations:
<tdl xmlns="http://www.nema.org/1997/C1219TDLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:tbd="http://www.example.com/tables/tbd/"
xmlns:xxx="http://www.example.com/tables/xxx/"
xsi:schemaLocation="http://www.nema.org/1997/C1219TDLSchema C1219TDL.xsd"
version="1.0" ></tdl>
I then add nodes to the document (setting InnerXml) that have tags and attributes that reference these namespaces.
I have 2 problems:
If the created tag is in the default namespace, it is written with an empty namespace attribute
<packedRecord name="ITEM_RCD" xmlns="">
If it uses a defined namespace, then it includes the namespace attribute:
<tbd:text xmlns:tbd="http://www.example.com/tables/tbd/">Ph</tbd:text>
Since these name spaces are defined in the document, my understanding (based on MSDN - XmlDocument.InnerXml) was that InnerXml would strip them out.
How do I get them to be stripped out?
I'm creating XML document using XDocument in C#.
I have a question.
Is
<Simple xmlns = "Example"></Simple>
equivalent to
<Example:Simple></Example:Simple>
?
I tried to get second solution with XNamespace and XElement in C#, but I get only first.
No.
The first example creates a Simple element in the Example namespace (note that namespaces are usually expressed as URIs)
The second example creates a Simple element in whatever namespace is associated with the Example prefix (as defined by an xmlns attribute).
These would be equivalent:
<xml xmlns="http://example.com/myNameSpace">
<Simple></Simple>
</xml>
<xml xmlns="http://example.com/myNameSpace" xmlns:Example="http://example.com/myNameSpace">
<Example:Simple></Example:Simple>
</xml>
In the first example, you have defined a default namespace which applies to any element/attribute that is not prefixed with its own namespace.
In the second example, you have not defined a namespace.
No, because xml namespaces allow for characters which aren't supported by element names, you can't prefix an element tag name with its namespace like that.
Add a namespace prefix, like so:
<alias:Simple xmlns:alias = "Example"></alias:Simple>
No, but it's equivalent to:
<Example:Simple xmlns:Example="Example"></Example:Simple>
It's a bad idea to use relative URIs as the namespace name, since this XML now has a different namespace depending on where it came from. So always give the full URI. E.g if the XML was being received from http://example.net/somePlace/someXML then the relative URI Example expands to http://example.net/somePlace/Example, so use it fully:
<Example:Simple xmlns:Example="http://example.net/somePlace/Example"></Example:Simple>
OR
<Simple xmlns="http://example.net/somePlace/Example"></Simple>
Otherwise if someone saved it in C:\Documents then on opening it again it becomes the equivalent to:
<Simple xmlns="file:///C|/Documents/Example"></Simple>
Which means that the meaning of Simple here is completely different to that when it was first downloaded.
Is there anyway to check with C# code , to see if an XSLT expects parameters?
XSLT is XML, so you can load the file into a XDocument and query it to see if there are any parameter elements defined on the top element.
Evaluate this XPath expression against the XML document that contains the XSLT file:
/*/xsl:param
If the result contains at least one node, then the xslt stylesheet contains at least one global xsl:param -- and the main purpose of a global xsl:param is to be set externally by the invoker of the transformation.
where you have to add the XSLT namespace and associate it with the "xsl:" prefix using an XmlNamespaceManager object.
I have several XML documents, all of which have the same structure (element names, attribute names and hierarchy).
However, some of the elements and attribute have custom namespaces in each XML document which are not known at design time. They change, don't ask...
How can I deal with this when traversing the documents using a single set of XPath?
Should I remove all the namespaces before processing?
Can I automatically register all namespaces with an XmlNamespaceManager?
Any thoughts?
Update: some examples (with namespace declarations omitted for clarity):
<root>
<child attr="val" />
</root>
<root>
<x:child attr="val" />
</root>
<root>
<y:child z:attr="val" />
</root>
Thanks
Suppose you have following xml:
<root xmlns="first">
<el1 xmlns="second">
<el2 xmlns="third">...
You can write you queries to ignore namespaces in the following way:
/*[local-name()='root']/*[local-name()='el1']/*[local-name()='el2']
etc.
Of course you can iterate over the whole document to get namespaces and load them into nsmanager. But in general case this will cause you to evaluate every node in the document. In this case it will be faster to just treat document as a tree of objects and don't use XPath.
I believe you'll find some good insight in this Stackoverflow thread
XPath + Namespace Driving me crazy
In my opinion you have either of two solutions:
1- If the set of all possible namespaces are know before hand, then you can register them all in a XmlNamespaceManager before you begin parsing
2- Use Xpath namespace-agnostic selectors
Of course you can always scrub the xml document from any inline namespaces and start your parsing on a clean unfiorm xml without namespace.. but honestly I don't see the gain in adding this overhead step.
Scott Hanselman has a nice article about extracting all of the XML Namespaces in an XML document. Presumably, when you get all of the XML Namespaces, you can just iterate over all of them and register them in your namespace manager.
You could try something like this to strip the namespaces:
//Implemented based on interface, not part of algorithm
public string RemoveAllNamespaces(string xmlDocument)
{
return RemoveAllNamespaces(XElement.Parse(xmlDocument)).ToString();
}
//Core recursion function
private XElement RemoveAllNamespaces(XElement xmlDocument)
{
if (!xmlDocument.HasElements)
{
XElement xElement = new XElement(xmlDocument.Name.LocalName);
xElement.Value = xmlDocument.Value;
return xElement;
}
return new XElement(xmlDocument.Name.LocalName, xmlDocument.Elements().Select(el => RemoveAllNamespaces(el)));
}
See Peter Stegnar's answer here for more details:
How to remove all namespaces from XML with C#?
You can also use direct node tests with wildcards, which will match any namespace (or lack thereof):
$your-document/*:root/*:child/#*:attr
I'm having a problem getting the "xmlns" to appear first in the root attribute list.
Im getting this:
<myroot
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.someurl.com/ns/myroot http://www.someurl.com/xml/schemas/myschema.xsd"
xmlns="http://www.someurl.com/ns/myroot">
<sometag>somecontent</sometag>
</myroot>
And i want this:
<myroot
xmlns="http://www.someurl.com/ns/myroot"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.someurl.com/ns/myroot http://www.someurl.com/xml/schemas/myschema.xsd">
<sometag>somecontent</sometag>
</myroot>
My code looks like this:
XNamespace rt = "http://www.someurl.com/ns/myroot";
XNamespace xsi = "http://www.w3.org/2001/XMLSchema-instance";
var submissionNode = new XElement(XmlNameSpaces.rt + "myroot");
submissionNode.Add(new XAttribute(XNamespace.Xmlns + "xsi", "http://www.w3.org/2001/XMLSchema-instance"));
submissionNode.Add(new XAttribute(xsi + "schemaLocation", #"http://www.someurl.com/ns/myroot http://www.someurl.com/xml/schemas/myschema.xsd"););
What do i need to do different to change the order?
EDIT: I understand the order is not normally relavent, but its a requirement in this case.
IIRC, the order of attributes (in xml) is unimportant... so why change it? Is it causing an actual problem?
Would XmlWriter be an option for you?
Afaik, it gives you full control of the order of attributes and namespace declarations.
Attribute ordering is NOT specified in the XML document, and shouldn't be relied upon. It may be worth looking at the spec
You'll find that if you read a XML document into a DOM, and write it out, regardless of the platform/library, you can't (and shouldn't) rely on the attribute ordering. It's a common misconception, btw!
I have a customer with this very problem. This was a real pain in the s, so I wrote a workaround to solve this.
Please note this is not a beautiful solution, and this should be not encouraged, but works.
public static class MyKludgeXmlClass
{
public static XmlDocument CreateXmlDocumentWithOrderedNamespaces()
{
var xml = "<?xml version=\"1.0\" encoding=\"utf-8\"?><MyRoot xmlns=\"http://www.example.com/schemas/1.0/VRSync\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://www.example.com/schemas/1.0/VRSync http://xml.example.com/vrsync.xsd\"></MyRoot>";
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.LoadXml(xml);
return doc;
}
}
With XmlDocument you can retrieve the root:
var xmlDoc = MyKludgeXmlClass.CreateXmlDocumentWithOrderedNamespaces();
XmlElement root = xmlDoc.DocumentElement;
And append children nodes using your favorite method.
Software that requires attributes to be in a specified order doesn't conform to the XML recommendation.
The first question you should be asking is not, "How can I produce XML with namespace attributes in a defined order?" Instead, it should be, "What are the other respects in which this software doesn't conform to the XML recommendation?" Because I will bet you one crisp new American dollar that if the recipient's process violates the XML recommendation in one respect, it violates it in at least one other.
Because sometimes the right answer is to say, no, don't do that...
Per W3C Namespaces in XML Recommendation, section 3 Declaring Namespaces:
[Definition: A namespace (or more precisely, a namespace binding) is
declared using a family of reserved attributes. Such an attribute's name must either be xmlns or begin xmlns:. These
attributes, like any other XML attributes, may be provided directly or
by default. ]
Therefore, the order of namespace declarations, like the order of any attributes, is insignificant.
So, no conformant XML tool or library will care about the order of namespace declarations, and neither should you.