Working With Specific XML structure - c#

I am trying to get some data from an XML document. I have no control over the schema. If it were up to me I would have chosen another schema. I am using C#'s XPATH library to get the data.
XML DOC
<Journals>
<name>Title of Journal</name>
<totalvolume>2</totalvolume>
<JournalList>
<Volume no="1">
<Journal>
<issue>01</issue>
<Title>Title 1</Title>
<date>1997-03-10</date>
<link>www.somelink.com</link>
</Journal>
<Journal>
<issue>02</issue>
<Title>Title 3</Title>
<date>1997-03-17</date>
<link>www.somelink.com</link>
</Journal>
</Volume>
<Volume no="2">
<Journal>
<issue>01</issue>
<Title>Title 1</Title>
<date>1999-01-01</date>
<link>www.somelink.com</link>
</Journal>
<Journal>
<issue>01</issue>
<Title>Title 2</Title>
<date>1999-01-08</date>
<link>www.somelink.com</link>
</Journal>
</Volume>
</JournalList>
</Journals>
I am trying to get all the data in the Volume 2 node. Here is what I tried so far:
C# Code:
protected void loadXML(string url)
{
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(url);
string strQuery = "Volume[#no='2']";
XmlElement nodeList = xmlDoc.DocumentElement;
XmlNodeList JournalList = nodeList.SelectNodes(strQuery);
foreach (XmlElement Journal in JournalList)
{
XmlElement temp = Journal;
}
}
It seems there are no nodes in JournalList. Anyone? Thanks in advance/

Your code is looking for "Volume" nodes directly under the "Journals" node
Change this:
string strQuery = "Volume[#no='2']";
To this, in order to look for "Volume" nodes under the "JournalList" node:
string strQuery = "JournalList/Volume[#no='2']";
Also, there's a couple typos in your XML:
</Volume no="2"> -> <Volume no="2"> // no open tag found
</Journal> -> </Journals> // expecting end tag </Journals>
From your comment below:
how would I go about access each journal? for example. I want irrate through each "journal" and get the title of the journal?
In order to do that, you could modify your code slightly:
var nodeList = xmlDoc.DocumentElement;
var volume = nodeList.SelectSingleNode(strQuery);
foreach (XmlElement journal in volume.SelectNodes("Journal"))
{
var title = journal.GetElementsByTagName("Title")[0].InnerText;
}

Also you can use Linq to XML:
using System.Xml.Linq;
//...
string path="Path of your xml file"
XDocument doc = XDocument.Load(path);
var volume2= doc.Descendants("Volume").FirstOrDefault(e => e.Attribute("no").Value == "2");

Related

Algorithm for xml document splitting

I want to split an xml document into several xml documents by specified node name, (similar with string.Split(...).)
Example: I have the following xml document.
<root>
<nodeA>
Hello
</nodeA>
<nodeA>
<nodeB>
node b Text
</nodeB>
<nodeImage>
image.jpg
</nodeImage>
</nodeA>
<nodeA>
node a text
</nodeA>
</root>
I want to split this xml document into 3 parts by 'nodeImage', and keep the original xml structure. (Note: the node with name 'nodeImage' could be anywhere)
1. xml before nodeImage
2. xml for nodeImage
3. xml after nodeImage
For the sample xml, the results should be:
XML Document 1:
<root>
<nodeA>
Hello
</nodeA>
<nodeA>
<nodeB>
node b Text
</nodeB>
</nodeA>
</root>
XML Document 2:
<root>
<nodeA>
<nodeImage>
image.jpg
</nodeImage>
</nodeA>
</root>
XML Document 3:
<root>
<nodeA>
node a text
</nodeA>
</root>
Does anyone know if there is a good algorithm, or existing code sample for this requirement?
Update Notes:
If there is only one node with the name 'nodeImage' in the xml document, then this xml document should always be splitted into 3 xml documents.
XElement xe = XElement.Load(XMLFile);
foreach(XElement newXE in xe.Elements("nodeA"))
{
XElement root = new XElement("root",newXE);
root.Save(newFile);
}
The term "split" is slightly confusing. Splitting on one ocurrence does not usually produce three parts.
I start by trying to define your question in Linq to xml terms.
For every occurrence of XDocument.Descendants("nodeImage") you want to create:
A copy of the document where the nodeImage parent has the nodeImage and all succeeding nodes removed. In addition all ancestors must have all nextnodes removed.
A copy of the document where all ancestors of the nodeImage element have all XElement.NextNodes and XElement.PreviousNodes removed.
Running this check again on a copy of the XDocument where all Ancestor PreviousNodes have been removed.
If no occurrence is found. The document being checked is returned in its entirety.
A deep copy of XDocument is easy. It has a copy constructor.
Of course, this will be a hog on memory if your xml is of a significant size.
However, the challenge is to locate your node in every copy.
This question shows how you can get the XPath of an element. You can use that.
This works. Test it extensively.
var doc = new XmlDocument();
doc.LoadXml(#"<root>
<nodeA>
Hello
</nodeA>
<nodeA>
<nodeB>
node b Text
</nodeB>
<nodeImage>
image.jpg
</nodeImage>
</nodeA>
<nodeA>
node a text
</nodeA></root>");
var xmlFrags = new List<string>();
string xml = "<root>";
bool bNewFragment = true;
foreach (XmlNode nodeA in doc.SelectNodes("//root/nodeA")) {
XmlNode nodeImage = nodeA.SelectSingleNode("nodeImage");
if (nodeImage != null) {
xml += "<nodeA>";
var en = nodeA.GetEnumerator();
while (en.MoveNext()) {
XmlNode xn = (XmlNode)en.Current;
if (xn != nodeImage)
xml += xn.OuterXml;
}
xml += "</nodeA></root>";
xmlFrags.Add(xml);
xml = "<root><nodeA>" + nodeImage.OuterXml + "</nodeA></root>";
xmlFrags.Add(xml);
bNewFragment = true;
}
else
{
if (bNewFragment) {
xml = "<root>";
bNewFragment = false;
}
xml += nodeA.OuterXml;
}
}
if (!bNewFragment) {
xml += "</root>";
xmlFrags.Add(xml);
}
//Use the XML fragments as you like
foreach (var xmlFrag in xmlFrags)
Console.WriteLine(xmlFrag + Environment.NewLine);
something like this, using System.Xml.Linq?
var doc = XDocument.Parse(stringxml);
var res = new List<XElement>();
var cur = new XElement("root");
foreach (var node in doc.Element("root").Elements("nodeA"))
{
if (node.Element("nodeImage") == null)
{
cur.Add(node);
}
else
{
res.Add(cur);
res.Add(new XElement("root", node));
cur = new XElement("root");
}
}
res.Add(cur);
Try this:
using System;
using System.Xml;
class Program
{
static void Main(string[] args)
{
// create the XML documents
XmlDocument
doc1 = new XmlDocument(),
doc2 = new XmlDocument(),
doc3 = new XmlDocument();
// load the initial XMl into doc1
doc1.Load("input.xml");
// create the structure of doc2 and doc3
doc2.AppendChild(doc2.ImportNode(doc1.FirstChild, false));
doc3.AppendChild(doc3.ImportNode(doc1.FirstChild, false));
doc2.AppendChild(doc2.ImportNode(doc1.DocumentElement, false));
doc3.AppendChild(doc3.ImportNode(doc1.DocumentElement, false));
// select the nodeImage
var nodeImage = doc1.SelectSingleNode("//nodeImage");
if (nodeImage != null)
{
// append to doc3
var node3 = nodeImage.ParentNode.NextSibling;
var n3 = doc3.ImportNode(node3, true);
doc3.DocumentElement.AppendChild(n3);
// append to doc2
var n2 = doc2.ImportNode(nodeImage.ParentNode, true);
n2.RemoveChild(n2.SelectSingleNode("//nodeImage").PreviousSibling);
doc2.DocumentElement.AppendChild(n2);
// remove from doc1
nodeImage.ParentNode.ParentNode
.RemoveChild(nodeImage.ParentNode.NextSibling);
nodeImage.ParentNode
.RemoveChild(nodeImage);
}
Console.WriteLine(doc1.InnerXml);
Console.WriteLine(doc2.InnerXml);
Console.WriteLine(doc3.InnerXml);
}
}

Adding an element to this xml structure

<root>
<element1>innertext</element1>
<element2>innertext</element2>
<element3>
<child1>innertext</child1>
</element3>
</root>
I have an xml structure shown above.
I would like to "append" the xml file (it is already created) to add another "child" inside element3>, so that it will look like this:
<root>
<element1>innertext</element1>
<element2>innertext</element2>
<element3>
<child1>innertext</child1>
<child2>innertext</child2>
</element3>
</root>
Linq to xml and/or Xpath would be great
EDIT:
I have tried doing this:
XElement doc = XElement.Load(mainDirectory);
XElement newElem = doc.Elements("element3").First();
newElem.Add(new XElement("child2", "child2innertext"));
doc.Add(newElem);
doc.Save(mainDirectory);
XmlDocument xDoc = new XmlDocument();
xDoc.Load("filename.xml");
foreach (XmlNode xNode in xDoc.SelectNodes("/root/element3"))
{
XmlElement newElement = xDoc.CreateElement("Child2");
xNode.AppendChild(newElement);
xNode.InnerText = "myInnerText";
}
With XDocument you can achieve this as:
string xml = "<root><element1>innertext</element1><element2>innertext</element2><element3><child1>innertext</child1></element3></root>";
var doc = XDocument.Parse(xml); //use XDocument.Load("filepath"); in case if your xml is in a file.
var el3 = doc.Descendants("element3").FirstOrDefault();
el3.Add(new XElement("child2", "innertext"));
Please, try this LINQPAD example
void Main()
{
var xml =
#"<root>
<element1>innertext</element1>
<element2>innertext</element2>
<element3>
<child1>innertext</child1>
</element3>
</root>";
var doc = XDocument.Parse(xml);
doc.Root.Element("element3")
.Add(new XElement("child2", "innertext"));
doc.Dump();
}

reading node from xml file in XMLDocument

i am trying to grab the TopicName how should i go after it and try different combination but somehow i am unable to get TopicName below is my source codee...
XmlDocument xdoc = new XmlDocument();//xml doc used for xml parsing
xdoc.Load(
"http://latestpackagingnews.blogspot.com/feeds/posts/default"
);//loading XML in xml doc
XmlNodeList xNodelst = xdoc.DocumentElement.SelectNodes("content");//reading node so that we can traverse thorugh the XML
foreach (XmlNode xNode in xNodelst)//traversing XML
{
//litFeed.Text += "read";
}
sample xml file
<content type="application/xml">
<CatalogItems xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="sitename.xsd">
<CatalogSource Acronym="ABC" OrganizationName="ABC Corporation" />
<CatalogItem Id="3212" CatalogUrl="urlname">
<ContentItem xmlns:content="sitename.xsd" TargetUrl="url">
<content:SelectionSpec ClassList="" ElementList="" />
<content:Language Value="eng" Scheme="ISO 639-2" />
<content:Source Acronym="ABC" OrganizationName="ABC Corporation" />
<content:Topics Scheme="ABC">
<content:Topic TopicName="Marketing" />
<content:Topic TopiccName="Coverage" />
</content:Topics>
</ContentItem>
</CatalogItem>
</CatalogItems>
</content>
The Topic nodes in your XML are using the content namespace - you need to declare and use the XML namespace in your code, then you can use SelectNodes() to grab the nodes of interest - this worked for me:
XmlNamespaceManager nsmgr = new XmlNamespaceManager(xdoc.NameTable);
nsmgr.AddNamespace("content", "sitename.xsd");
var topicNodes = xdoc.SelectNodes("//content:Topic", nsmgr);
foreach (XmlNode node in topicNodes)
{
string topic = node.Attributes["TopicName"].Value;
}
Just as a comparison see how easy this would be with Linq to XML:
XDocument xdoc = XDocument.Load("test.xml");
XNamespace ns = "sitename.xsd";
string topic = xdoc.Descendants(ns + "Topic")
.Select(x => (string)x.Attribute("TopicName"))
.FirstOrDefault();
To get all topics you can replace the last statement with:
var topics = xdoc.Descendants(ns + "Topic")
.Select(x => (string)x.Attribute("TopicName"))
.ToList();
If you just need a specific element, then I'd use XPath:
This is a guide to use XPath in C#:
http://www.codeproject.com/KB/XML/usingXPathNavigator.aspx
And this is the query that will get you a collection of your Topics:
//content/CatalogItems/CatalogItem/ContentItem/content:Topics/content:Topic
You could tweak this query depending on what it is you're trying to accomplish, grabbing just a specific TopicName value:
//content/CatalogItems/CatalogItem/ContentItem/content:Topics/content:Topic/#TopicName
XPath is pretty easy to learn. I've done stuff like this pretty quickly with no prior knowledge.
You can paste you XML and xpath query here to test your queries:
http://www.bit-101.com/xpath/
The following quick and dirty LINQ to XML code obtains your TopicNames and prints them on the console.
XDocument lDoc = XDocument.Load(lXmlDocUri);
foreach (var lElement in lDoc.Element("content").Element(XName.Get("CatalogItems", "sitename.xsd")).Elements(XName.Get("CatalogItem", "sitename.xsd")))
{
foreach (var lContentTopic in lElement.Element(XName.Get("ContentItem", "sitename.xsd")).Element(XName.Get("Topics", "sitename.xsd")).Elements(XName.Get("Topic", "sitename.xsd")))
{
string lTitle = lContentTopic.Attribute("TopicName").Value;
Console.WriteLine(lTitle);
}
}
It'd have been a lot shorter if it wasn't for all the namespaces :) (Instead of "XName.Get" you would just use the name of the element).

Problem in reading XML node with unknown root/parent nodes

I have been trying to read an xml file. I have to extract value of nodes "Date" and "Name", but the problem is, they might appear at any level in XML hierarchy.
So when I try with this code,
XmlDocument doc = new XmlDocument();
doc.Load("test1.xml");
XmlElement root = doc.DocumentElement;
XmlNodeList nodes = root.SelectNodes("//*");
string date;
string name;
foreach (XmlNode node in nodes)
{
date = node["date"].InnerText;
name = node["name"].InnerText;
}
and the XML file is ::
<?xml version="1.0" encoding="utf-8"?>
<root>
<child>
<name>Aravind</name>
<date>12/03/2000</date>
</child>
</root>
the above code errors out, as <name> and <date> are not immediate child Elements of root.
is it possible to assume that parent/root nodes are unknown and just with the name of the nodes, copy the values ??
Depending on the exception you are getting, this may or may not be the exact solution. However, I would definitely check that date and name exist before doing a .InnerText on them.
foreach (XmlNode node in nodes)
{
dateNode = node["date"];
if(dateNode != null)
date = dateNode.InnerText;
// etc.
}
I would read up on XPATH and XPATH for C# to do this more efficiently
http://support.microsoft.com/kb/308333
http://www.w3schools.com/XPath/xpath_syntax.asp
Here's a little method that should allow you to get the innerText easily.
function string GetElementText(string xml, string node)
{
XPathDocument doc = new XPathDocument(xml);
XPathNavigator nav = doc.CreateNavigator();
XPathExpression expr = nav.Compile("//" + node);
XPathNodeIterator iterator = nav.Select(expr);
while (iterator.MoveNext())
{
// return 1st but there could be more
return iterator.Current.Value;
}
}
Try to use LINQ:
string xml = #"<?xml version='1.0' encoding='utf-8'?>
<root>
<date>12/03/2001</date>
<child>
<name>Aravind</name>
<date>12/03/2000</date>
</child>
<name>AS-CII</name>
</root>";
XDocument doc = XDocument.Parse(xml);
foreach (var date in doc.Descendants("date"))
{
Console.WriteLine(date.Value);
}
foreach (var date in doc.Descendants("name"))
{
Console.WriteLine(date.Value);
}
Console.ReadLine();
The Descendants method allows you to get all the elements that have a specified name.

XML CDATA Encoding

I am trying to build an XML document in C# with CDATA to hold the text inside an element. For example..
<email>
<![CDATA[test#test.com]]>
</email>
However, when I get the InnerXml property of the document, the CDATA has been reformatted so the InnerXml string looks like the below which fails.
<email>
<![CDATA[test#test.com]]>
</email>
How can I keep the original format when accessing the string of the XML?
Cheers
Don't use InnerText: use XmlDocument.CreateCDataSection:
using System;
using System.Xml;
public class Test
{
static void Main()
{
XmlDocument doc = new XmlDocument();
XmlElement root = doc.CreateElement("root");
XmlElement email = doc.CreateElement("email");
XmlNode cdata = doc.CreateCDataSection("test#test.com");
doc.AppendChild(root);
root.AppendChild(email);
email.AppendChild(cdata);
Console.WriteLine(doc.InnerXml);
}
}
With XmlDocument:
XmlDocument doc = new XmlDocument();
XmlElement email = (XmlElement)doc.AppendChild(doc.CreateElement("email"));
email.AppendChild(doc.CreateCDataSection("test#test.com"));
string xml = doc.OuterXml;
or with XElement:
XElement email = new XElement("email", new XCData("test#test.com"));
string xml = email.ToString();
See XmlDocument::CreateCDataSection Method for information and examples how to create CDATA nodes in an XML Document

Categories