I have several XML documents, all of which have the same structure (element names, attribute names and hierarchy).
However, some of the elements and attribute have custom namespaces in each XML document which are not known at design time. They change, don't ask...
How can I deal with this when traversing the documents using a single set of XPath?
Should I remove all the namespaces before processing?
Can I automatically register all namespaces with an XmlNamespaceManager?
Any thoughts?
Update: some examples (with namespace declarations omitted for clarity):
<root>
<child attr="val" />
</root>
<root>
<x:child attr="val" />
</root>
<root>
<y:child z:attr="val" />
</root>
Thanks
Suppose you have following xml:
<root xmlns="first">
<el1 xmlns="second">
<el2 xmlns="third">...
You can write you queries to ignore namespaces in the following way:
/*[local-name()='root']/*[local-name()='el1']/*[local-name()='el2']
etc.
Of course you can iterate over the whole document to get namespaces and load them into nsmanager. But in general case this will cause you to evaluate every node in the document. In this case it will be faster to just treat document as a tree of objects and don't use XPath.
I believe you'll find some good insight in this Stackoverflow thread
XPath + Namespace Driving me crazy
In my opinion you have either of two solutions:
1- If the set of all possible namespaces are know before hand, then you can register them all in a XmlNamespaceManager before you begin parsing
2- Use Xpath namespace-agnostic selectors
Of course you can always scrub the xml document from any inline namespaces and start your parsing on a clean unfiorm xml without namespace.. but honestly I don't see the gain in adding this overhead step.
Scott Hanselman has a nice article about extracting all of the XML Namespaces in an XML document. Presumably, when you get all of the XML Namespaces, you can just iterate over all of them and register them in your namespace manager.
You could try something like this to strip the namespaces:
//Implemented based on interface, not part of algorithm
public string RemoveAllNamespaces(string xmlDocument)
{
return RemoveAllNamespaces(XElement.Parse(xmlDocument)).ToString();
}
//Core recursion function
private XElement RemoveAllNamespaces(XElement xmlDocument)
{
if (!xmlDocument.HasElements)
{
XElement xElement = new XElement(xmlDocument.Name.LocalName);
xElement.Value = xmlDocument.Value;
return xElement;
}
return new XElement(xmlDocument.Name.LocalName, xmlDocument.Elements().Select(el => RemoveAllNamespaces(el)));
}
See Peter Stegnar's answer here for more details:
How to remove all namespaces from XML with C#?
You can also use direct node tests with wildcards, which will match any namespace (or lack thereof):
$your-document/*:root/*:child/#*:attr
Related
If I have an XML file with namespaces like:
<root>
<h:table xmlns:h="http://www.namespaces.com/namespaceOne">
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<h:table xmlns:h="https://www.namespaces.com/namespaceTwo">
<h:name>African Coffee Table</h:name>
<h:width>80</h:width>
<h:length>120</h:length>
</h:table>
</root>
I want to hoist all of the namespaces to the root element, like this:
<root xmlns:h="http://www.namespaces.com/namespaceOne" xmlns:h1="https://www.namespaces.com/namespaceTwo">
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<h1:table>
<h1:name>African Coffee Table</h1:name>
<h1:width>80</h1:width>
<h1:length>120</h1:length>
</h1:table>
</root>
Is there a way to do this? Ideally automatically resolving conflicting namespace prefixes, as in the example above. I haven't committed to using Linq to XML or System.Xml yet, so either would be a possibility.
There is one major constraint: because of the environment I am working in, I can't write classes. I can write functions, but no new class definitions.
Turns out this is pretty straightforward:
var doc = XDocument.Parse(xml);
var namespaceAttributes = doc.Descendants()
.SelectMany(x => x.Attributes())
.Where(x => x.IsNamespaceDeclaration);
int count = 1;
foreach (var namespaceAttribute in namespaceAttributes)
{
doc.Root.Add(new XAttribute(XNamespace.Xmlns + $"h{count}", namespaceAttribute.Value));
namespaceAttribute.Remove();
count++;
}
We loop through all namespace declarations (xmlns:foo="foo"). For each one we find, we put a namespace attribute with the same URL on the root element, and remove that one.
Demo.
Note that this does slightly odd things if you have multiple namespaces with the same URL (e.g. if you have two lots of xmlns:h="https://www.namespaces.com/namespaceOne" on different children): it puts multiple xmlns declarations on the root element with the same URL, but all elements use the last such namespace. If you want to avoid that, just keep a list of namespaces you've added to the root element.
I have an XML string which has the following structure:
<Element>
<Property1>Something</Propert1>
<Property2>SomethingElse</Property2>
</Element>
<Element>
<Property1>Something2</Propert1>
<Property2>SomethingElse2</Property2>
</Element>
I would like to serialize this to a List<Element>.
I use this code:
XmlSerializer xd = new XmlSerializer(typeof(T));
XDocument xdoc = XDocument.Parse(xmlStringToDesirialize);
T deserializedObject = xd.Deserialize(xdoc.CreateReader()) as T;
Where T is List<Element>. I get an exception saying There are multiple root elements. I understand why this is, but I`m not sure what to do about it.
I was thinking that adding a psudo-root element, like <Elements> might be a good solution, but I don't know how I would go about adding it to the XML document I already have.
Or maybe there is an alternative solution altogether.
EDIT: For completeness I am adding code for the full solution I needed for deserialization, in case anyone needs it.
I created a class:
[XmlRoot("myRoot", Namespace = "")]
public class MyRoot
{
[XmlElement("Element", Namespace = "{The xmlns of the actual class}")]
public List<Element> Elements {get; set;}
public MyRoot()
{
Elements = new List<Element>();
}
}
Then I deserialize to this class after adding the tags as suggested by #Richard.
Hope this can help someone.
I have an XML string
No you don't. You have something that can be refered to as an "XML Fragment". Specifically because there is not a single root element it is not an XML document. (There is no such thing as "Invalid XML": it is either valid or it is not XML).
XML Parsers require an XML document. But XmlSeriailsie is not just a parse: it includes an XML parser (of course) but also wants to generate an object graph from the content of the XML document making lots of assumptions about type availability and restrictions on the XML to match those types
The easiest approach with normal XML parsers would be to add a root element yourself. eg.:
var xdoc = XDocument.Parse("<myRoot>" + theString + "</myRoot>");
however for XML Deserialisation you will need to modify your available types to include a container that serialises with a myRoot element and then contains the relevant information.
However given the sample XML I see no sign of that looking like an object graph. Why not work with the parsed XML and extract the content using the parser's API?
XML must have 1! root element. So as you said before making <elements> as root node is solution.
And XML would look like:
<Elements>
<Element>
<Property1>Something</Propert1>
<Property2>SomethingElse</Property2>
</Element>
<Element>
<Property1>Something2</Propert1>
<Property2>SomethingElse2</Property2>
</Element>
</Elements>
why not create an element then append yours to it
XElement root = new XElement("root");
then append your elements to it
Hello I want to know how can I parse this simple XML file content in C#. I can have multiple "in" elements, and from those I want to use date, min, max and state child values.
<out>
<in>
<id>16769</id>
<date>29-10-2010</date>
<now>12</now>
<min>12</min>
<max>23</max>
<state>2</state>
<description>enter text</description>
</in>
<in>
<id>7655</id>
<date>12-10-2010</date>
<now>1</now>
<min>1</min>
<max>2</max>
<state>0</state>
<description>enter text</description>
</in>
</out>
The System.XML namespace has all sorts of tools for parsing, reading, and writing XML data. By the way, your XML isn't well-formed; you've got two <out> elements, but only one </out> element.
Linq to xml is also helpful for parsing xml -
http://msdn.microsoft.com/en-us/library/bb387098.aspx
Also -
http://msdn.microsoft.com/library/bb308960.aspx
You need System.XML, starting with XmlDocument.Load(filename).
Once you have the XmlDocument loaded, you can drill down into it as needed using the built-in .Net XML object model, starting from XmlDocument level. You can walk the tree recursively in a pretty intuitive way, capturing what you want from each XmlNode as you go.
Alternatively (and preferably) you can quickly locate all XmlNodes in your XmlDocument that match certain conditions using XPath - examples here. An example of usage in C# is XmlNode.SelectNodes.
using System;
using System.IO;
using System.Xml;
public class Sample {
public static void Main() {
XmlDocument doc = new XmlDocument();
doc.Load("booksort.xml");
XmlNodeList nodeList;
XmlNode root = doc.DocumentElement;
nodeList=root.SelectNodes("descendant::book[author/last-name='Austen']");
//Change the price on the books.
foreach (XmlNode book in nodeList)
{
book.LastChild.InnerText="15.95";
}
Console.WriteLine("Display the modified XML document....");
doc.Save(Console.Out);
}
}
Examples can be found here http://www.c-sharpcorner.com/uploadfile/mahesh/readwritexmltutmellli2111282005041517am/readwritexmltutmellli21.aspx
This might be beyond what you want to do, but worth mentioning...
I hate parsing XML. Seriously, I almost refuse to do it, especially since .NET can do it for me. What I would do is create an "In" object that has the properties above. You probably have one already, or it would take 60 seconds to create. You'll also need a List of In objects called "Out".
Then just deserialze the XML into the objects. This takes just a few lines of code. Here is an example. BTW, this makes changing and re-saving the data just as easy.
How to serialize/deserialize
Is it possible to get the open tag from a XmlNode with all attributes, namespace, etc?
eg.
<root xmlns="urn:..." rattr="a">
<child attr="1">test</child>
</root>
I would like to retrieve the entire opening tag, exactly as retrieved from the original XML document if possible, from the XmlNode and later the closing tag. Both as strings.
Basically XmlNode.OuterXml without the child nodes.
EDIT
To elaborate, XmlNode.OuterXml on a node that was created with the XML above would return the entire XML fragment, including child nodes as a single string.
XmlNode.InnerXml on that same fragment would return the child nodes but not the parent node, again as a single string.
But I need the opening tag for the XML fragment without the children nodes. And without building it using the XmlAttribute array, LocalName, Namespace, etc.
This is C# 3.5
Thanks
Is there some reason you can't simply say:
string s = n.OuterXml.Substring(0, n.OuterXml.IndexOf(">") + 1);
I think the simplest way would be to call XmlNode.CloneNode(false) which (according to the docs) will clone all the attributes but not child nodes. You can then use OuterXml - although that will give you the closing tag as well.
For example:
using System;
using System.Xml;
public class Test
{
static void Main()
{
XmlDocument doc = new XmlDocument();
doc.LoadXml(#"<root xmlns='urn:xyz' rattr='a'>
<child attr='1'>test</child></root>");
XmlElement root = doc.DocumentElement;
XmlNode clone = root.CloneNode(false);
Console.WriteLine(clone.OuterXml);
}
}
Output:
<root xmlns="urn:xyz" rattr="a"></root>
Note that this may not be exactly as per the original XML document, in terms of the ordering of attributes etc. However, it will at least be equivalent.
How about:
xmlNode.OuterXML.Replace(xmlNode.InnerXML, String.Empty);
Poor man's solution :)
I'm having a problem getting the "xmlns" to appear first in the root attribute list.
Im getting this:
<myroot
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.someurl.com/ns/myroot http://www.someurl.com/xml/schemas/myschema.xsd"
xmlns="http://www.someurl.com/ns/myroot">
<sometag>somecontent</sometag>
</myroot>
And i want this:
<myroot
xmlns="http://www.someurl.com/ns/myroot"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.someurl.com/ns/myroot http://www.someurl.com/xml/schemas/myschema.xsd">
<sometag>somecontent</sometag>
</myroot>
My code looks like this:
XNamespace rt = "http://www.someurl.com/ns/myroot";
XNamespace xsi = "http://www.w3.org/2001/XMLSchema-instance";
var submissionNode = new XElement(XmlNameSpaces.rt + "myroot");
submissionNode.Add(new XAttribute(XNamespace.Xmlns + "xsi", "http://www.w3.org/2001/XMLSchema-instance"));
submissionNode.Add(new XAttribute(xsi + "schemaLocation", #"http://www.someurl.com/ns/myroot http://www.someurl.com/xml/schemas/myschema.xsd"););
What do i need to do different to change the order?
EDIT: I understand the order is not normally relavent, but its a requirement in this case.
IIRC, the order of attributes (in xml) is unimportant... so why change it? Is it causing an actual problem?
Would XmlWriter be an option for you?
Afaik, it gives you full control of the order of attributes and namespace declarations.
Attribute ordering is NOT specified in the XML document, and shouldn't be relied upon. It may be worth looking at the spec
You'll find that if you read a XML document into a DOM, and write it out, regardless of the platform/library, you can't (and shouldn't) rely on the attribute ordering. It's a common misconception, btw!
I have a customer with this very problem. This was a real pain in the s, so I wrote a workaround to solve this.
Please note this is not a beautiful solution, and this should be not encouraged, but works.
public static class MyKludgeXmlClass
{
public static XmlDocument CreateXmlDocumentWithOrderedNamespaces()
{
var xml = "<?xml version=\"1.0\" encoding=\"utf-8\"?><MyRoot xmlns=\"http://www.example.com/schemas/1.0/VRSync\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://www.example.com/schemas/1.0/VRSync http://xml.example.com/vrsync.xsd\"></MyRoot>";
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.LoadXml(xml);
return doc;
}
}
With XmlDocument you can retrieve the root:
var xmlDoc = MyKludgeXmlClass.CreateXmlDocumentWithOrderedNamespaces();
XmlElement root = xmlDoc.DocumentElement;
And append children nodes using your favorite method.
Software that requires attributes to be in a specified order doesn't conform to the XML recommendation.
The first question you should be asking is not, "How can I produce XML with namespace attributes in a defined order?" Instead, it should be, "What are the other respects in which this software doesn't conform to the XML recommendation?" Because I will bet you one crisp new American dollar that if the recipient's process violates the XML recommendation in one respect, it violates it in at least one other.
Because sometimes the right answer is to say, no, don't do that...
Per W3C Namespaces in XML Recommendation, section 3 Declaring Namespaces:
[Definition: A namespace (or more precisely, a namespace binding) is
declared using a family of reserved attributes. Such an attribute's name must either be xmlns or begin xmlns:. These
attributes, like any other XML attributes, may be provided directly or
by default. ]
Therefore, the order of namespace declarations, like the order of any attributes, is insignificant.
So, no conformant XML tool or library will care about the order of namespace declarations, and neither should you.