Process namespaces using XmlReader - c#

I have a complex XML file with structure as follows:
<?xml version="1.0" encoding="UTF-8"?>
<Document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="xxx:xxx:xxx:xxx:xxxxx:xxx:xsd:xxxx.xxx.xxx.xx">
<Element1>
<Element2>
<Element2A>xxxxxx</Element2A>
<Element2B>2012-08-29T00:00:00</Element2B>
</Element2>
</Element1>
</Document>
Now I am using XmlReader to read this XML document and process information as follows
XmlReader xr = XmlReader.Create(filename);
while (xr.Read())
{
xr.MoveToElement();
XElement node = (XElement)XElement.ReadFrom(xr);
Console.WriteLine(node.Name);
}
xr.Close();
The problem I am facing is in the output the namespace is prefixed to the ElementName. E.g output
{xxx:xxx:xxx:xxx:xxxxx:xxx:xsd:xxxx.xxx.xxx.xx}Element1
Is there any way I can remove/ handle this as I need to do further filtering using Element names and Child names.

XElement.Name is not (as you might expect) a String, but rather an XName which has a LocalName property, thus:
Console.WriteLine(node.Name.LocalName);

You may want to remove the namespace. One way to remove namespace is to write c# code and other way is to use XSLT transformation as suggested in Remove Namespace
-Milind

Related

How to parse a XML with nested XML text

Trying to read XML file with nested XML object with own XML declaration. As expected got exception:
Unexpected XML declaration. The XML declaration must be the first node in the document, and no white space characters are allowed to appear before it.
How can i read that specific element as text and parse it as separate XML document for later deserialization?
<?xml version="1.0" encoding="UTF-8"?>
<Data>
<Items>
<Item>
<Target type="System.String">Some target</Target>
<Content type="System.String"><?xml version="1.0" encoding="utf-8"?><Data><Items><Item><surname type="System.String">Some Surname</surname><name type="System.String">Some Name</name></Item></Items></Data></Content>
</Item>
</Items>
</Data>
Every approach i'm trying fail due to declaration exception.
var xml = System.IO.File.ReadAllText("Info.xml");
var xDoc = XDocument.Parse(xml); // Exception
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xml); // Exception
var xmlReader = XmlReader.Create(new StringReader(xml));
xmlReader.ReadToFollowing("Content"); // Exception
I have no control over XML creation.
The only way I would know is by getting rid of the illegal second <?xml> declaration. I wrote a sample that will simply look for and discard the second <?xml>. After that the string has become valid XML and can be parsed. You may need to tweak it a bit to make it work for your exact scenario.
Code:
using System;
using System.Xml;
public class Program
{
public static void Main()
{
var badXML = #"<?xml version=""1.0"" encoding=""UTF-8""?>
<Data>
<Items>
<Item>
<Target type=""System.String"">Some target</Target>
<Content type=""System.String""><?xml version=""1.0"" encoding=""utf-8""?><Data><Items><Item><surname type=""System.String"">Some Surname</surname><name type=""System.String"">Some Name</name></Item></Items></Data></Content>
</Item>
</Items>
</Data>";
var goodXML = badXML.Replace(#"<Content type=""System.String""><?xml version=""1.0"" encoding=""utf-8""?>"
, #"<Content type=""System.String"">");
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml(goodXML);
XmlNodeList itemRefList = xmlDoc.GetElementsByTagName("Content");
foreach (XmlNode xn in itemRefList)
{
Console.WriteLine(xn.InnerXml);
}
}
}
Output:
<Data><Items><Item><surname type="System.String">Some Surname</surname><name type="System.String">Some Name</name></Item></Items></Data>
Working DotNetFiddle: https://dotnetfiddle.net/ShmZCy
Perhaps needless to say: all of this would not have been needed if the thing that created this invalid XML would have applied the common rule to wrap the nested XML in a <![CDATA[ .... ]]> block.
The <?xml ...?> processing declaration is only valid on the first line of an XML document, and so the XML that you've been given isn't well-formed XML. This will make it quite difficult to parse as is without either changing the source document (and you've indicated that's not possible) or preprocessing the source.
You could try:
Stripping out the <?xml ?> instruction with regex or string manipulation, but the cure there may be worse than the disease.
The HTMLAgilityPack, which implements a more forgiving parser, may work with an XML document
Other than that, the producer of the document should look to produce well-formed XML:
CDATA sections can help this, but be aware that CDATA can't contain the ]]> end tag.
XML escaping the XML text can work fine; that is, use the standard routines to turn < into < and so forth.
XML namespaces can also help here, but they can be daunting in the beginning.

Namespace of specific XML Node in c#

I have the following XML structure:
<?xml version="1.0" encoding="utf-16"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<StoreResponse xmlns="http://www.some-site.com">
<StoreResult>
<Message />
<Code>OK</Code>
</StoreResult>
</StoreResponse>
</soap:Body>
</soap:Envelope>
I need to get the InnerText from Codeout of this document and I need help with the appropriate XPATH statement.
I'm really confused by XML namespaces. While working on a previous namespace problem in another XML document, I learned, that even if there's nothing in front of Code (e.g. ns:Code), it is still part of a namespace defined by the xmlns attribute in its parent node. Now, there are multiple xmlns nodes defined in parents of Code. What is the namespace that I need to specify in an XPATH statement? Is there such a thing as a "primary namespace"? Do childnodes inherit the (primary) namespace of it's parents?
The namespace of the <Code> element is http://www.some-site.com. xmlsn:xxx means that names prefixed by xxx: (like soap:Body) have that namespace. xmlns by itself means that this is the default namespace for names without any prefix.
An example of using an XDocument (Linq) approach:
XNamespace ns = "http://www.some-site.com";
var document = XDocument.Parse("your-xml-string");
var elements = document.Descendants( ns + "StoreResult" )
Descendant elements will inherit the last immediate namespace. In your example you will need to create two namespaces one for the soap envelope and a second for "some-site".
Here's an option I found in this question: Weirdness with XDocument, XPath and namespaces
var xml = "<your xml>";
var doc = XDocument.Parse(xml); // Could use .Load() here too
var code = doc.XPathSelectElement("//*[local-name()='Code']");

Why does having a xmlns cause my C# program not to read XML?

I have a C# program that attempts to read the following xml, but can't read any elements:
<?xml version="1.0" encoding="UTF-8"?>
<!-- Comments Here -->
<FileFeed
xmlns="http://www.mycompany.com/schemas/xxx/FileFeed/V1"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.somecompany.com/schemas/xxx/FileFeed/V1
FileFeed.xsd"
RecordCount = "1">
<Object>
<ID>PAMSMOKE110113xxx</ID>
<CorpID>12509</CorpID>
<AnotherID>201654702345</AnotherID>
<TimeStamp>2013-09-03</TimeStamp>
<Type>Some Type</Type>
<SIM_ID>89011704258012600767</SIM_ID>
<Code>ZZZ</Code>
<Year>2013</Year>
</Object>
</FileFeed>
With the above XML my C# program is unable to read any elements.. For instance the ID Element is always NULL.
Now if I simply remove the first xmlns from the above XML, my program can read all the elements without any issues. The problem is I have to process the XML file in the format that's given to me, and can't change the file format. My program reads the below XML just fine: Note the line xmlns="http://www.mycompany.com/schemas/xxx/FileFeed/V1" is removed.
<?xml version="1.0" encoding="UTF-8"?>
<!-- Comments Here -->
<FileFeed
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.somecompany.com/schemas/xxx/FileFeed/V1
FileFeed.xsd"
RecordCount = "1">
<Object>
<ID>PAMSMOKE110113xxx</ID>
<CorpID>12509</CorpID>
<AnotherID>201654702345</AnotherID>
<TimeStamp>2013-09-03</TimeStamp>
<Type>Some Type</Type>
<SomeNumber>89011704258012600767</SomeNumber>
<Code>ZZZ</Code>
<Year>2013</Year>
</Object>
</FileFeed>
I realize I'm not posting any code, but just wondering what possible issue could I be having, where simply removing the xmlns line resolves everything??
Your problem is with xml namespaces
Using Linq2Xml
XNamespace ns = "http://www.mycompany.com/schemas/xxx/FileFeed/V1";
var xDoc = XDocument.Load(fname);
var id = xDoc.Root.Element(ns + "Object").Element(ns + "ID").Value;
Your root element FileFeed has a namespace attribute. This means that each element inside it also uses that namespace.
The Element method takes an XName as its argument. Usually you use a string which gets implicitly converted into an XName.
If you want to include a namespace you create an XNamespace and add the string. Since XNamespace overloads the + operator this will also result in an XName.
XDocument doc = XDocument.Load("Test.xml");
// this will be null
XElement objectElementWithoutNS = doc.Root.Element("Object");
XNamespace ns = doc.Root.GetDefaultNamespace();
XElement objectElementWithNS = doc.Root.Element(ns + "Object");
Xml namespaces are more or less like C# namespaces. Would you be able to access a class when its namespace is set or not set?
public namespace My.Company.Schemas {
public class FileFeed
vs
public class FileFeed {
They are two DISTINCT classes! The same applies to XML - by setting a namespace you make it possible to have documents with similar or even the same internal structure but they represent two disctinct documents that are not exchangeable. This is really convenient.
If you'd like to get help on why your actual reading method doesn't consider the namespace, you have to present the C# code. The general rule though is that any reading API makes is possible to set the namespace for actual reading.

Selecting XML Node with XPath

I have a xml where i want to select a node from it here is the xml:
<?xml version="1.0" encoding="utf-8" ?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<InResponse xmlns="https://ww.ggg.com">
<InResult>Error </InResult>
</InResponse>
</soap:Body>
</soap:Envelope>
I am loading it using XmlDocument's LoadXML and trying to get InResult node but I get null see below please:
xml.SelectSingleNode("//InResult").InnerText;
You have a namespace declaration and you should add this into your XPath or you can use namespace agnostic XPath. Try next code as namespace agnostic solution:
xml.SelectSingleNode("//*[local-name()='InResult']").InnerText;
I've received Error as result
From http://www.w3schools.com/ site:
local-name() - Returns the name of the current node or the first node
in the specified node set - without the namespace prefix
You can get more information about XPath functions here.
Namespace aware solution, is given below:
var namespaceManager = new XmlNamespaceManager(x.NameTable);
namespaceManager.AddNamespace("defaultNS", "https://ww.ggg.com");
var result = x.SelectSingleNode("//defaultNS:InResponse", namespaceManager).InnerText;
Console.WriteLine (result); //prints Error
Brief XML notes:
This part in root note xmlns:soap="http://www.w3.org/2003/05/soap-envelope" is a xml namespace declaration. It is used to identify nodes in your xml structure. As a rule, you need to specify them to access nodes with it, but there are namespace agnostic solutions in XPath and in LINQ to XML. Now if you see node name as <soap:Body>, this means, that this node belongs to this namespace.
This seems to be an namespace issue
You can use an XmlNamespaceManager before you call SelectSingleNode():
XmlNamespaceManager ns = new XmlNamespaceManager(xmldoc.NameTable);
ns.AddNamespace("ggg", "https://ww.ggg.com");
xml.SelectSingleNode("//ggg:InResult", ns).InnerText;
Attention: Not tested.

Query an XmlDocument without getting a 'Namespace prefix is not defined' problem

I've got an Xml document that both defines and references some namespaces. I load it into an XmlDocument object and to the best of my knowledge I create a XmlNamespaceManager object with which to query Xpath against. Problem is I'm getting XPath exceptions that the namespace "my" is not defined. How do I get the namespace manager to see that the namespaces I am referencing are already defined. Or rather how do I get the namespace definitions from the document to the namespace manager.
Furthermore tt strikes me as strange that you have to provide a namespace manager to the document which you create from the documents nametable in the first place. Even if you need to hardcode manual namespaces why can't you add them directly to the document. Why do you always have to pass this namespace manager with every single query? What can't XmlDocument just know?
Code:
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(programFiles + #"Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\FEATURES\HfscBookingWorkflow\template.xml");
XmlNamespaceManager ns = new XmlNamespaceManager(xmlDoc.NameTable);
XmlNode referenceNode = xmlDoc.SelectSingleNode("/my:myFields/my:ReferenceNumber", ns);
referenceNode.InnerXml = this.bookingData.ReferenceNumber;
XmlNode titleNode = xmlDoc.SelectSingleNode("/my:myFields/my:Title", ns);
titleNode.InnerXml = this.bookingData.FamilyName;
Xml:
<?xml version="1.0" encoding="UTF-8" ?>
<?mso-infoPathSolution name="urn:schemas-microsoft-com:office:infopath:Inspection:-myXSD-2010-01-15T18-21-55" solutionVersion="1.0.0.104" productVersion="12.0.0" PIVersion="1.0.0.0" ?>
<?mso-application progid="InfoPath.Document" versionProgid="InfoPath.Document.2"?>
<my:myFields xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-01-15T18:21:55" xmlns:xd="http://schemas.microsoft.com/office/infopath/2003">
<my:DateRequested xsi:nil="true" />
<my:DateVisited xsi:nil="true" />
<my:ReferenceNumber />
<my:FireCall>false</my:FireCall>
Update:
ns.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
ns.AddNamespace("xhtml", "http://www.w3.org/1999/xhtml");
ns.AddNamespace("xd", "http://schemas.microsoft.com/office/infopath/2003");
ns.AddNamespace("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-01-15T18:21:55");
This does the job, but it mean's I have to hard code to this particular xml schema. This schema represents an infopath form template. In particular the my namespace url will be different for every form template so I really don't want to hardcode this. It would be nice to find a clean way to get this namespace from the xml without resorting to RegEx.
I was hoping that the XmlNamespaceManager would just sort of pick up the namespace definitions form the NameTable. I mean their in the Xml but I still have to define them.
ns.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
ns.AddNamespace("xhtml", "http://www.w3.org/1999/xhtml");
ns.AddNamespace("xd", "http://schemas.microsoft.com/office/infopath/2003");
ns.AddNamespace("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-01-15T18:21:55");
This does the job, but it mean's I have to hard code to this particular xml schema. This schema represents an infopath form template. In particular the my namespace url will be different for every form template so I really don't want to hardcode this. It would be nice to find a clean way to get this namespace from the xml without resorting to Regex.
I was hoping that the XmlNamespaceManager would just sort of pick up the namespace definitions form the NameTable. I mean their in the Xml but I still have to define them.
Here is the answer to the "What can't XmlDocument just know?" question.
NameTable is just an optimization for storing names. It has actually nothing to do with namespaces.
And even if XmlNamespaceManager could infer all namespaces and prefixes from XML doc that won't help in general case because of XML namespaces nature, e.g. what would XmlNamespaceManager map "my" prefix in this case:
<root>
<foo xmlns:my="blah"/>
<foo xmlns:my="balh-blah-blah"/>
</root>
Have you defined "my" in the namespace-manager?
ns.AddNamespace("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-01-15T18:21:55");
Or better - choose something that is unlikely to conflict. It does seem odd that it didn't pick it up from the name-table, though.
For me with InfoPath 2007 this solved the problem
static public XmlNamespaceManager GetNameSpaceManager(this XmlDocument document)
{
XmlNamespaceManager xmlNamespaceManager = new XmlNamespaceManager(document.NameTable);
xmlNamespaceManager.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
xmlNamespaceManager.AddNamespace("dfs", "http://schemas.microsoft.com/office/infopath/2003/dataFormSolution");
xmlNamespaceManager.AddNamespace("d", "http://schemas.microsoft.com/office/infopath/2003/ado/dataFields");
xmlNamespaceManager.AddNamespace("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2012-03-29T06:28:28");
xmlNamespaceManager.AddNamespace("xd", "http://schemas.microsoft.com/office/infopath/2003");
return xmlNamespaceManager;
}

Categories