Algorithm for xml document splitting - c#

I want to split an xml document into several xml documents by specified node name, (similar with string.Split(...).)
Example: I have the following xml document.
<root>
<nodeA>
Hello
</nodeA>
<nodeA>
<nodeB>
node b Text
</nodeB>
<nodeImage>
image.jpg
</nodeImage>
</nodeA>
<nodeA>
node a text
</nodeA>
</root>
I want to split this xml document into 3 parts by 'nodeImage', and keep the original xml structure. (Note: the node with name 'nodeImage' could be anywhere)
1. xml before nodeImage
2. xml for nodeImage
3. xml after nodeImage
For the sample xml, the results should be:
XML Document 1:
<root>
<nodeA>
Hello
</nodeA>
<nodeA>
<nodeB>
node b Text
</nodeB>
</nodeA>
</root>
XML Document 2:
<root>
<nodeA>
<nodeImage>
image.jpg
</nodeImage>
</nodeA>
</root>
XML Document 3:
<root>
<nodeA>
node a text
</nodeA>
</root>
Does anyone know if there is a good algorithm, or existing code sample for this requirement?
Update Notes:
If there is only one node with the name 'nodeImage' in the xml document, then this xml document should always be splitted into 3 xml documents.

XElement xe = XElement.Load(XMLFile);
foreach(XElement newXE in xe.Elements("nodeA"))
{
XElement root = new XElement("root",newXE);
root.Save(newFile);
}

The term "split" is slightly confusing. Splitting on one ocurrence does not usually produce three parts.
I start by trying to define your question in Linq to xml terms.
For every occurrence of XDocument.Descendants("nodeImage") you want to create:
A copy of the document where the nodeImage parent has the nodeImage and all succeeding nodes removed. In addition all ancestors must have all nextnodes removed.
A copy of the document where all ancestors of the nodeImage element have all XElement.NextNodes and XElement.PreviousNodes removed.
Running this check again on a copy of the XDocument where all Ancestor PreviousNodes have been removed.
If no occurrence is found. The document being checked is returned in its entirety.
A deep copy of XDocument is easy. It has a copy constructor.
Of course, this will be a hog on memory if your xml is of a significant size.
However, the challenge is to locate your node in every copy.
This question shows how you can get the XPath of an element. You can use that.

This works. Test it extensively.
var doc = new XmlDocument();
doc.LoadXml(#"<root>
<nodeA>
Hello
</nodeA>
<nodeA>
<nodeB>
node b Text
</nodeB>
<nodeImage>
image.jpg
</nodeImage>
</nodeA>
<nodeA>
node a text
</nodeA></root>");
var xmlFrags = new List<string>();
string xml = "<root>";
bool bNewFragment = true;
foreach (XmlNode nodeA in doc.SelectNodes("//root/nodeA")) {
XmlNode nodeImage = nodeA.SelectSingleNode("nodeImage");
if (nodeImage != null) {
xml += "<nodeA>";
var en = nodeA.GetEnumerator();
while (en.MoveNext()) {
XmlNode xn = (XmlNode)en.Current;
if (xn != nodeImage)
xml += xn.OuterXml;
}
xml += "</nodeA></root>";
xmlFrags.Add(xml);
xml = "<root><nodeA>" + nodeImage.OuterXml + "</nodeA></root>";
xmlFrags.Add(xml);
bNewFragment = true;
}
else
{
if (bNewFragment) {
xml = "<root>";
bNewFragment = false;
}
xml += nodeA.OuterXml;
}
}
if (!bNewFragment) {
xml += "</root>";
xmlFrags.Add(xml);
}
//Use the XML fragments as you like
foreach (var xmlFrag in xmlFrags)
Console.WriteLine(xmlFrag + Environment.NewLine);

something like this, using System.Xml.Linq?
var doc = XDocument.Parse(stringxml);
var res = new List<XElement>();
var cur = new XElement("root");
foreach (var node in doc.Element("root").Elements("nodeA"))
{
if (node.Element("nodeImage") == null)
{
cur.Add(node);
}
else
{
res.Add(cur);
res.Add(new XElement("root", node));
cur = new XElement("root");
}
}
res.Add(cur);

Try this:
using System;
using System.Xml;
class Program
{
static void Main(string[] args)
{
// create the XML documents
XmlDocument
doc1 = new XmlDocument(),
doc2 = new XmlDocument(),
doc3 = new XmlDocument();
// load the initial XMl into doc1
doc1.Load("input.xml");
// create the structure of doc2 and doc3
doc2.AppendChild(doc2.ImportNode(doc1.FirstChild, false));
doc3.AppendChild(doc3.ImportNode(doc1.FirstChild, false));
doc2.AppendChild(doc2.ImportNode(doc1.DocumentElement, false));
doc3.AppendChild(doc3.ImportNode(doc1.DocumentElement, false));
// select the nodeImage
var nodeImage = doc1.SelectSingleNode("//nodeImage");
if (nodeImage != null)
{
// append to doc3
var node3 = nodeImage.ParentNode.NextSibling;
var n3 = doc3.ImportNode(node3, true);
doc3.DocumentElement.AppendChild(n3);
// append to doc2
var n2 = doc2.ImportNode(nodeImage.ParentNode, true);
n2.RemoveChild(n2.SelectSingleNode("//nodeImage").PreviousSibling);
doc2.DocumentElement.AppendChild(n2);
// remove from doc1
nodeImage.ParentNode.ParentNode
.RemoveChild(nodeImage.ParentNode.NextSibling);
nodeImage.ParentNode
.RemoveChild(nodeImage);
}
Console.WriteLine(doc1.InnerXml);
Console.WriteLine(doc2.InnerXml);
Console.WriteLine(doc3.InnerXml);
}
}

Related

Working With Specific XML structure

I am trying to get some data from an XML document. I have no control over the schema. If it were up to me I would have chosen another schema. I am using C#'s XPATH library to get the data.
XML DOC
<Journals>
<name>Title of Journal</name>
<totalvolume>2</totalvolume>
<JournalList>
<Volume no="1">
<Journal>
<issue>01</issue>
<Title>Title 1</Title>
<date>1997-03-10</date>
<link>www.somelink.com</link>
</Journal>
<Journal>
<issue>02</issue>
<Title>Title 3</Title>
<date>1997-03-17</date>
<link>www.somelink.com</link>
</Journal>
</Volume>
<Volume no="2">
<Journal>
<issue>01</issue>
<Title>Title 1</Title>
<date>1999-01-01</date>
<link>www.somelink.com</link>
</Journal>
<Journal>
<issue>01</issue>
<Title>Title 2</Title>
<date>1999-01-08</date>
<link>www.somelink.com</link>
</Journal>
</Volume>
</JournalList>
</Journals>
I am trying to get all the data in the Volume 2 node. Here is what I tried so far:
C# Code:
protected void loadXML(string url)
{
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(url);
string strQuery = "Volume[#no='2']";
XmlElement nodeList = xmlDoc.DocumentElement;
XmlNodeList JournalList = nodeList.SelectNodes(strQuery);
foreach (XmlElement Journal in JournalList)
{
XmlElement temp = Journal;
}
}
It seems there are no nodes in JournalList. Anyone? Thanks in advance/
Your code is looking for "Volume" nodes directly under the "Journals" node
Change this:
string strQuery = "Volume[#no='2']";
To this, in order to look for "Volume" nodes under the "JournalList" node:
string strQuery = "JournalList/Volume[#no='2']";
Also, there's a couple typos in your XML:
</Volume no="2"> -> <Volume no="2"> // no open tag found
</Journal> -> </Journals> // expecting end tag </Journals>
From your comment below:
how would I go about access each journal? for example. I want irrate through each "journal" and get the title of the journal?
In order to do that, you could modify your code slightly:
var nodeList = xmlDoc.DocumentElement;
var volume = nodeList.SelectSingleNode(strQuery);
foreach (XmlElement journal in volume.SelectNodes("Journal"))
{
var title = journal.GetElementsByTagName("Title")[0].InnerText;
}
Also you can use Linq to XML:
using System.Xml.Linq;
//...
string path="Path of your xml file"
XDocument doc = XDocument.Load(path);
var volume2= doc.Descendants("Volume").FirstOrDefault(e => e.Attribute("no").Value == "2");

Xpath selection of all attributes

I have the xml below in a c# class.
string xml =#"<?xml version='1.0' encoding='UTF-8'?>
<call method='importCube'>
<credentials login='sampleuser#company.com' password='my_pwd' instanceCode='INSTANCE1'/>
<importDataOptions version='Plan' allowParallel='false' moveBPtr='false'/>
I need to update the XML held within the node attributes i.e.
string xml =#"<?xml version='1.0' encoding='UTF-8'?>
<call method='importCube'>
<credentials login='testuser#test.com' password='userpassword' instanceCode='userinstance'/>
<importDataOptions version='Actual' allowParallel='true' moveBPtr='true'/>
I've written code to do this :
// instantiate XmlDocument and load XML from string
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
// get a list of nodes
XmlNodeList aNodes = doc.SelectNodes("/call/credentials");
// loop through all nodes
foreach (XmlNode aNode in aNodes)
{
// grab the attribute
XmlAttribute idLogin = aNode.Attributes["login"];
XmlAttribute idPass = aNode.Attributes["password"];
XmlAttribute idInstance = aNode.Attributes["instanceCode"];
idLogin.Value = "myemail.com";
idPass.Value = "passtest";
idInstance.Value = "TestInstance";
}
It works but the issue at the minute is that I have to repeat the whole code block for each node path i.e.
XmlNodeList aNodes = doc.SelectNodes("/call/importDataOptions");
....
There has to be a better way. Any ideas how I can rip through the attributes in 1 pass?
Maybe just using a cast to XmlElement could help you to reduce your code:
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
foreach (XmlElement credential in doc.SelectNodes("/call/credentials"))
{
credential.SetAttribute("login" , "myemail.com" );
credential.SetAttribute("password" , "passtest" );
credential.SetAttribute("instanceCode", "TestInstance");
}
Another option is to create an object structure which resembles your XML vocabulary and deserialize your input into that objects, but it seems overkill.
EDIT: Per your comment, you could go with:
foreach (XmlElement node in doc.SelectNodes("/call/*")) // it's case sensitive
{
switch(node.Name)
{
case "credentials":
node.SetAttribute("login" , "myemail.com" );
node.SetAttribute("password" , "passtest" );
node.SetAttribute("instanceCode", "TestInstance");
break;
case "importDataOptions":
// ...
break;
default:
throw new ArgumentOutOfRangeException("Unexpected node: "+node.Name);
}
}

C# how to create a custom xml document

I'm really just trying to create a custom xml document for simple configuration processing.
XmlDocument xDoc = new XmlDocument();
string[] NodeArray = { "Fruits|Fruit", "Vegetables|Veggie"};
foreach (string node in NodeArray)
{
XmlNode xmlNode = xDoc.CreateNode(XmlNodeType.Element,node.Split('|')[0],null);
//xmlNode.Value = node.Split('|')[0];
xmlNode.InnerText = node.Split('|')[1];
xDoc.DocumentElement.AppendChild(xmlNode);
}
What i'm trying to get is this.
<?xml version="1.0" encoding="ISO-8859-1"?>
<Fruits>Fruit</Fruits>
<Vegetables>Veggie</Vegetables>
i get not set to value of an object at xDoc.DocumentElement.AppendChild(xmlNode);
Unfortunately, You can't make that XML structure.
All XML documents must have a single root node. You can't have more.
Try something like this
XmlDocument xDoc = new XmlDocument();
xDoc.AppendChild( xDoc.CreateElement("root"));
string[] NodeArray = { "Fruits|Fruit", "Vegetables|Veggie" };
foreach (string node in NodeArray)
{
XmlNode xmlNode = xDoc.CreateNode(XmlNodeType.Element, node.Split('|')[0], null);
//xmlNode.Value = node.Split('|')[0];
xmlNode.InnerText = node.Split('|')[1];
xDoc.DocumentElement.AppendChild(xmlNode);
}
It is not possible to have that XML structure as XML MUST have a single root element. You may want to try:
<?xml version="1.0" encoding="ISO-8859-1"?>
<items>
<Fruits>Fruit</Fruits>
<Vegetables>Veggie</Vegetables>
</items>

How to read single node value from xml file

Hi i am trying to get value from xml but it shows node null.
Here is my xml file.
<?xml version="1.0" encoding="utf-8"?>
<result xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://www.cfhdocmail.com/TestAPI2/Result.xsd https://www.cfhdocmail.com/TestAPI2/Result.xsd" xmlns="https://www.cfhdocmail.com/TestAPI2/Result.xsd">
<data>
<key>MailingGUID</key>
<value>0aa2b2e3-7afa-4002-ab2f-9eb4cbe33ae7</value>
</data>
<data>
<key>OrderRef</key>
<value>52186</value>
</data>
</result>
I want to get "MailingGUID" value.
Here is the code that i have tried:
private void readXML()
{
XmlDocument xml = new XmlDocument();
// You'll need to put the correct path to your xml file here
xml.Load(Server.MapPath("~/XmlFile11.xml"));
// Select a specific node
XmlNode node = xml.SelectSingleNode("result/data/value");
// Get its value
string name = node.InnerText;
}
Please tell me how i can get MailingGUID value.
Thanks
UPDATE:
I think there might be something wrong with your schemas, I removed references to them and your code worked fine. I tried this:
const string str = "<?xml version=\"1.0\" encoding=\"utf-8\"?><result><data><key>MailingGUID</key><value>0aa2b2e3-7afa-4002-ab2f-9eb4cbe33ae7</value></data><data><key>OrderRef</key><value>52186</value></data></result>";
var xml = new XmlDocument();
xml.LoadXml(str);
xml.DocumentElement.SelectSingleNode("/result/data/value").InnerText
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
//Parsing of xml is done here
Document doc = builder.parse(new File("C:\\Users\\User_Name\\Documents\\My Received Files\\PDSL_ABM.xml"));
//Here we get the root element of XML and print out
doc.getDocumentElement().normalize();
System.out.println ("Root element of the doc is " + doc.getDocumentElement().getNodeName());
NodeList list = doc.getElementsByTagName("MailingGUID");
int totalMailingGUID =list.getLength();
System.out.println("Total no of MailingGUID : " + totalSupplierPartID);
//Traversing all the elements from the list and printing out its data
for (int i = 0; i < list.getLength(); i++) {
//Getting one node from the list.
Node childNode = list.item(i);
System.out.println("MailingGUID : " + childNode.getTextContent());
}

Problem in reading XML node with unknown root/parent nodes

I have been trying to read an xml file. I have to extract value of nodes "Date" and "Name", but the problem is, they might appear at any level in XML hierarchy.
So when I try with this code,
XmlDocument doc = new XmlDocument();
doc.Load("test1.xml");
XmlElement root = doc.DocumentElement;
XmlNodeList nodes = root.SelectNodes("//*");
string date;
string name;
foreach (XmlNode node in nodes)
{
date = node["date"].InnerText;
name = node["name"].InnerText;
}
and the XML file is ::
<?xml version="1.0" encoding="utf-8"?>
<root>
<child>
<name>Aravind</name>
<date>12/03/2000</date>
</child>
</root>
the above code errors out, as <name> and <date> are not immediate child Elements of root.
is it possible to assume that parent/root nodes are unknown and just with the name of the nodes, copy the values ??
Depending on the exception you are getting, this may or may not be the exact solution. However, I would definitely check that date and name exist before doing a .InnerText on them.
foreach (XmlNode node in nodes)
{
dateNode = node["date"];
if(dateNode != null)
date = dateNode.InnerText;
// etc.
}
I would read up on XPATH and XPATH for C# to do this more efficiently
http://support.microsoft.com/kb/308333
http://www.w3schools.com/XPath/xpath_syntax.asp
Here's a little method that should allow you to get the innerText easily.
function string GetElementText(string xml, string node)
{
XPathDocument doc = new XPathDocument(xml);
XPathNavigator nav = doc.CreateNavigator();
XPathExpression expr = nav.Compile("//" + node);
XPathNodeIterator iterator = nav.Select(expr);
while (iterator.MoveNext())
{
// return 1st but there could be more
return iterator.Current.Value;
}
}
Try to use LINQ:
string xml = #"<?xml version='1.0' encoding='utf-8'?>
<root>
<date>12/03/2001</date>
<child>
<name>Aravind</name>
<date>12/03/2000</date>
</child>
<name>AS-CII</name>
</root>";
XDocument doc = XDocument.Parse(xml);
foreach (var date in doc.Descendants("date"))
{
Console.WriteLine(date.Value);
}
foreach (var date in doc.Descendants("name"))
{
Console.WriteLine(date.Value);
}
Console.ReadLine();
The Descendants method allows you to get all the elements that have a specified name.

Categories