XDocument.Descendants(itemName) - Problems finding qualified name - c#

I'm trying to read a XML-RSS-Feed from a website. Therefore I use a async download and create a XDocument with the XDocument.Parse() Method.
The Document intends to be very simple, like this:
<root>
<someAttribute></SomeAttribute>
<item>...</item>
<item>...</item>
</root>
Now I want to read out all the items. Therefore I tried:
foreach (XElement NewsEntry in xDocument.Descendants("item"))
but this doesn't work. So I found a post in this board to use the qualified name, because there are some namespaces defined in the root element:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns="http://purl.org/rss/1.0/">
well, I tried all 3 available namespaces - nothing worked for me:
XName itemName = XName.Get("item", "http://www.w3.org/1999/02/22-rdf-syntax-ns#");
XName itemName2 = XName.Get("item", "http://purl.org/dc/elements/1.1/");
XName itemName3 = XName.Get("item", "http://purl.org/rss/1.0/modules/syndication/");
Any help would be appreciated.
(Usually I'm doing the XML-Analysis with Regex - but this time I'm developing for a mobile device, and therefore need to care about performance.)

You have not tried the default namespace at the end of the rdf declaration:
xmlns="http://purl.org/rss/1.0/"
This makes sense, as any element in the default namespace will not need to have the namespace prepended to the element name.

Not directly a solution to the XDocument RSS read problem. But why aren't you using the provided SyncdicationFeed class to load the feed? http://msdn.microsoft.com/en-us/library/system.servicemodel.syndication.syndicationfeed.aspx

Try this
var elements = from p in xDocument.Root.Elements()
where p.Name.LocalName == "item"
select p;
foreach(var element in elements)
{
//Do stuff
}

Related

Namespace of specific XML Node in c#

I have the following XML structure:
<?xml version="1.0" encoding="utf-16"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<StoreResponse xmlns="http://www.some-site.com">
<StoreResult>
<Message />
<Code>OK</Code>
</StoreResult>
</StoreResponse>
</soap:Body>
</soap:Envelope>
I need to get the InnerText from Codeout of this document and I need help with the appropriate XPATH statement.
I'm really confused by XML namespaces. While working on a previous namespace problem in another XML document, I learned, that even if there's nothing in front of Code (e.g. ns:Code), it is still part of a namespace defined by the xmlns attribute in its parent node. Now, there are multiple xmlns nodes defined in parents of Code. What is the namespace that I need to specify in an XPATH statement? Is there such a thing as a "primary namespace"? Do childnodes inherit the (primary) namespace of it's parents?
The namespace of the <Code> element is http://www.some-site.com. xmlsn:xxx means that names prefixed by xxx: (like soap:Body) have that namespace. xmlns by itself means that this is the default namespace for names without any prefix.
An example of using an XDocument (Linq) approach:
XNamespace ns = "http://www.some-site.com";
var document = XDocument.Parse("your-xml-string");
var elements = document.Descendants( ns + "StoreResult" )
Descendant elements will inherit the last immediate namespace. In your example you will need to create two namespaces one for the soap envelope and a second for "some-site".
Here's an option I found in this question: Weirdness with XDocument, XPath and namespaces
var xml = "<your xml>";
var doc = XDocument.Parse(xml); // Could use .Load() here too
var code = doc.XPathSelectElement("//*[local-name()='Code']");

Why does having a xmlns cause my C# program not to read XML?

I have a C# program that attempts to read the following xml, but can't read any elements:
<?xml version="1.0" encoding="UTF-8"?>
<!-- Comments Here -->
<FileFeed
xmlns="http://www.mycompany.com/schemas/xxx/FileFeed/V1"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.somecompany.com/schemas/xxx/FileFeed/V1
FileFeed.xsd"
RecordCount = "1">
<Object>
<ID>PAMSMOKE110113xxx</ID>
<CorpID>12509</CorpID>
<AnotherID>201654702345</AnotherID>
<TimeStamp>2013-09-03</TimeStamp>
<Type>Some Type</Type>
<SIM_ID>89011704258012600767</SIM_ID>
<Code>ZZZ</Code>
<Year>2013</Year>
</Object>
</FileFeed>
With the above XML my C# program is unable to read any elements.. For instance the ID Element is always NULL.
Now if I simply remove the first xmlns from the above XML, my program can read all the elements without any issues. The problem is I have to process the XML file in the format that's given to me, and can't change the file format. My program reads the below XML just fine: Note the line xmlns="http://www.mycompany.com/schemas/xxx/FileFeed/V1" is removed.
<?xml version="1.0" encoding="UTF-8"?>
<!-- Comments Here -->
<FileFeed
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.somecompany.com/schemas/xxx/FileFeed/V1
FileFeed.xsd"
RecordCount = "1">
<Object>
<ID>PAMSMOKE110113xxx</ID>
<CorpID>12509</CorpID>
<AnotherID>201654702345</AnotherID>
<TimeStamp>2013-09-03</TimeStamp>
<Type>Some Type</Type>
<SomeNumber>89011704258012600767</SomeNumber>
<Code>ZZZ</Code>
<Year>2013</Year>
</Object>
</FileFeed>
I realize I'm not posting any code, but just wondering what possible issue could I be having, where simply removing the xmlns line resolves everything??
Your problem is with xml namespaces
Using Linq2Xml
XNamespace ns = "http://www.mycompany.com/schemas/xxx/FileFeed/V1";
var xDoc = XDocument.Load(fname);
var id = xDoc.Root.Element(ns + "Object").Element(ns + "ID").Value;
Your root element FileFeed has a namespace attribute. This means that each element inside it also uses that namespace.
The Element method takes an XName as its argument. Usually you use a string which gets implicitly converted into an XName.
If you want to include a namespace you create an XNamespace and add the string. Since XNamespace overloads the + operator this will also result in an XName.
XDocument doc = XDocument.Load("Test.xml");
// this will be null
XElement objectElementWithoutNS = doc.Root.Element("Object");
XNamespace ns = doc.Root.GetDefaultNamespace();
XElement objectElementWithNS = doc.Root.Element(ns + "Object");
Xml namespaces are more or less like C# namespaces. Would you be able to access a class when its namespace is set or not set?
public namespace My.Company.Schemas {
public class FileFeed
vs
public class FileFeed {
They are two DISTINCT classes! The same applies to XML - by setting a namespace you make it possible to have documents with similar or even the same internal structure but they represent two disctinct documents that are not exchangeable. This is really convenient.
If you'd like to get help on why your actual reading method doesn't consider the namespace, you have to present the C# code. The general rule though is that any reading API makes is possible to set the namespace for actual reading.

Linq to XML - how do I get this element value

Fairly simple one, but my knowledge is limited in this area. I'm using the following c# code to access the value of elements within my SGML and XML documents.
It's working fine when there is only one element with the given name in the document, but as soon as there are more than one element with the same name it throws an exception, obviously!
I need to use xpath or some other way of specifying the location of the element i'm trying to get the value of.
XDocument doc = XDocument.Load(sgmlReader);
string system = doc.Descendants("chapnum").Single().Value;
return system;
This works fine, if there is only one "chapnum" in the doc, but i need to specifically get the value of "chapnum" at the following nested location "dmaddres/chapnum".
How please?
Here is a sample of the xml doc. I'm trying to get the value of the "chapnum" element nested in the "dmaddress" element.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE dmodule []>
<dmodule xmlns:dc="http://www.purl.org/dc/elements/1.1/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.s1000d.org/S1000D_2-3-1/xml_schema_flat/descript.xsd">
<idstatus>
<dmaddres>
<dmc><avee><modelic>xx</modelic><sdc>A</sdc><chapnum>29</chapnum>
<section>1</section><subsect>3</subsect><subject>54</subject><discode
>00</discode><discodev>AAA</discodev><incode>042</incode><incodev
>A</incodev><itemloc>D</itemloc></avee></dmc>
<dmtitle><techname>Switch</techname><infoname>Description of function</infoname>
</dmtitle>
<issno inwork="00" issno="001" type="new"/>
<issdate day="20" month="07" year="2012"/>
<language language="sx"/></dmaddres>
<status>
<security class="01"/><datarest><instruct><distrib>-</distrib><expcont
>Obey the national regulations for export control.</expcont></instruct>
<inform><copyright><para><refdm><avee><modelic>xx</modelic><sdc>A</sdc>
<chapnum>29</chapnum><section>1</section><subsect>3</subsect><subject
>54</subject><discode>00</discode><discodev>ZZZ</discodev><incode
>021</incode><incodev>Z</incodev><itemloc>D</itemloc></avee></refdm
></para></copyright><datacond>BREXREF=AJ-A-00-00-00-05ZZZ-022Z-D VERSUB=CDIM-V6</datacond>
</inform></datarest>
<rpc>xxxxx</rpc>
<orig>xxxxx</orig>
<applic>
<type>-</type>
<model model="xxxxx"><mfc>xxxxx</mfc><pnr>xxxxxxx</pnr></model>
</applic>
<brexref><refdm><avee><modelic>xx</modelic><sdc>A</sdc><chapnum>00</chapnum>
<section>0</section><subsect>0</subsect><subject>00</subject><discode
>05</discode><discodev>ZZZ</discodev><incode>022</incode><incodev
>Z</incodev><itemloc>D</itemloc></avee></refdm></brexref>
like this?
string system = doc.Descendants("dmaddres")
.Single(e => e.Element("chapnum") != null)
.Element("chapnum").Value;
string system = doc.Root.Element("dmaddres").Element("chapnum").Value;
would probably do just as well.

Process namespaces using XmlReader

I have a complex XML file with structure as follows:
<?xml version="1.0" encoding="UTF-8"?>
<Document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="xxx:xxx:xxx:xxx:xxxxx:xxx:xsd:xxxx.xxx.xxx.xx">
<Element1>
<Element2>
<Element2A>xxxxxx</Element2A>
<Element2B>2012-08-29T00:00:00</Element2B>
</Element2>
</Element1>
</Document>
Now I am using XmlReader to read this XML document and process information as follows
XmlReader xr = XmlReader.Create(filename);
while (xr.Read())
{
xr.MoveToElement();
XElement node = (XElement)XElement.ReadFrom(xr);
Console.WriteLine(node.Name);
}
xr.Close();
The problem I am facing is in the output the namespace is prefixed to the ElementName. E.g output
{xxx:xxx:xxx:xxx:xxxxx:xxx:xsd:xxxx.xxx.xxx.xx}Element1
Is there any way I can remove/ handle this as I need to do further filtering using Element names and Child names.
XElement.Name is not (as you might expect) a String, but rather an XName which has a LocalName property, thus:
Console.WriteLine(node.Name.LocalName);
You may want to remove the namespace. One way to remove namespace is to write c# code and other way is to use XSLT transformation as suggested in Remove Namespace
-Milind

XML DocumentElement is trashing the innerXml

I have a simple XML file, shown below, which when read-in via a basic XmlDocument.Load(filename.xml). If I load the file, and inspect it's innerXML, it all looks normal. However, when I inspect the value of DocumentElement, it's a mess!!! I kept the example small, so you can easily see there is no mal-formation:
<?xml version="1.0" encoding="UTF-8"?>
<fax:FaxService xmlns:fax="http://www.hp.com/schemas/imaging/con/service/fax/2009/02/11/" xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">
<fax:ServiceDefaults>
<fax:ServiceSendDefaults>
<fax:InternetFaxSettings>
<dd:FaxFileFormat>MTIFFG4</dd:FaxFileFormat>
<dd:UseEmailAsFaxAcctAddr>false</dd:UseEmailAsFaxAcctAddr>
<dd:AutoCompleteToNANP>false</dd:AutoCompleteToNANP>
<dd:RetryInterval>0</dd:RetryInterval>
<dd:MaxRetryAttempts>0</dd:MaxRetryAttempts>
</fax:InternetFaxSettings>
</fax:ServiceSendDefaults>
</fax:ServiceDefaults>
</fax:FaxService>
Now, try this in C# with this simple code:
...
XmlDocument xDoc = new XmlDocument();
xDoc.Load("*XMLSAMPLE.XML*");
textBox1.Text = xDoc.InnerXml;
textBox2.Text = xDoc.DocumentElement.InnerXml;
...
It's completely mangled, with the 2nd namespace repeated with every dd tag, and not even included in the top-most tag.
What am I doing wrong? This is driving me nuts!
The content returned by xDoc.DocumentElement.InnerXml is semantically identical to your original ServiceDefaults tag - if the first fragment conforms to your XML schema, the InnerXml fragment will also conform to the definition of the inner element. Just because the framework has re-arranged the namespace declarations does not change the semantics of the document.
Compare the output of your the two XmlDocument properties:
xDoc.DocumentElement:
<?xml version="1.0" encoding="UTF-8"?>
<fax:FaxService xmlns:fax="http://www.hp.com/schemas/imaging/con/service/fax/2009/02/11/" xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">
<fax:ServiceDefaults>
<fax:ServiceSendDefaults>
<fax:InternetFaxSettings>
<dd:FaxFileFormat>MTIFFG4</dd:FaxFileFormat>
<dd:UseEmailAsFaxAcctAddr>false</dd:UseEmailAsFaxAcctAddr>
<dd:AutoCompleteToNANP>false</dd:AutoCompleteToNANP>
<dd:RetryInterval>0</dd:RetryInterval>
<dd:MaxRetryAttempts>0</dd:MaxRetryAttempts>
</fax:InternetFaxSettings>
</fax:ServiceSendDefaults>
</fax:ServiceDefaults>
</fax:FaxService>
xDoc.DocumentElement.InnerXml:
<fax:ServiceDefaults xmlns:fax="http://www.hp.com/schemas/imaging/con/service/fax/2009/02/11/">
<fax:ServiceSendDefaults>
<fax:InternetFaxSettings>
<dd:FaxFileFormat xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">MTIFFG4</dd:FaxFileFormat>
<dd:UseEmailAsFaxAcctAddr xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">false</dd:UseEmailAsFaxAcctAddr>
<dd:AutoCompleteToNANP xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">false</dd:AutoCompleteToNANP>
<dd:RetryInterval xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">0</dd:RetryInterval>
<dd:MaxRetryAttempts xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">0</dd:MaxRetryAttempts>
</fax:InternetFaxSettings>
</fax:ServiceSendDefaults>
</fax:ServiceDefaults>
A look at the following link in MSDN will help shed light on your situation:
http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.innerxml.aspx
Basically, xDoc.DocumentElement.InnerXml is looking at the <fax:ServiceDefaults> node, whereas xDoc.InnerXml is looking one level higher (FaxService node). This is crucial to understanding your problem - because all of your xmlns is on the FaxService node.
Make the following change to your XML document, and notice what happens (basically, copy over the xmlns info to the ServiceDefaults node:
<?xml version="1.0" encoding="UTF-8"?>
<fax:FaxService xmlns:fax="http://www.hp.com/schemas/imaging/con/service/fax/2009/02/11/" xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">
<fax:ServiceDefaults xmlns:fax="http://www.hp.com/schemas/imaging/con/service/fax/2009/02/11/" xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">
<fax:ServiceSendDefaults>
<fax:InternetFaxSettings>
<dd:FaxFileFormat>MTIFFG4</dd:FaxFileFormat>
<dd:UseEmailAsFaxAcctAddr>false</dd:UseEmailAsFaxAcctAddr>
<dd:AutoCompleteToNANP>false</dd:AutoCompleteToNANP>
<dd:RetryInterval>0</dd:RetryInterval>
<dd:MaxRetryAttempts>0</dd:MaxRetryAttempts>
</fax:InternetFaxSettings>
</fax:ServiceSendDefaults>
</fax:ServiceDefaults>
</fax:FaxService>
Suddenly your code will behave according to your expectations. So hopefully this helps you towards understanding the issue. What the permanent fix should be, that's up to you.
HTH!

Categories