c# XML Parsing separating innerxml from innertext - c#

What I am trying to do is create ideally a nested List basically a 2d list, or a 2D array if that is better for this task, that would work as follows ID => 1 Name => Hickory without explicitly selecting the node.
I could use SelectNode (Woods/Wood) and then do something like node["ID"].InnerText but that would require that I know what the nodes name is.
Assume that this would read wood.xml even if there were 36 nodes instead of 7 and that I will never know the name of the nodes. I tried using outerxml/innerxml but that gives me too much information.
XmlDocument doc = new XmlDocument();
doc.Load("wood.xml");
//Here is wood.xml
/*<Woods><Wood><ID>1</ID><Name>Hickory</Name><Weight>3</Weight><Thickness>4</Thickness><Density>5</Density><Purity>6</Purity><Age>7</Age></Wood><Wood><ID>2</ID><Name>Soft Maple</Name><Weight>3</Weight><Thickness>4</Thickness><Density>5</Density><Purity>6</Purity><Age>7</Age></Wood><Wood><ID>3</ID><Name>Red Oak</Name><Weight>3</Weight><Thickness>4</Thickness><Density>5</Density><Purity>6</Purity><Age>7</Age></Wood></Woods>*/
XmlNode root = doc.FirstChild;
//Display the contents of the child nodes.
if (root.HasChildNodes)
{
for (int i=0; i<root.ChildNodes.Count; i++)
{
Console.WriteLine(root.ChildNodes[i].InnerXml);
Console.WriteLine();
}
Console.ReadKey();
}
That would allow me to basically create a wood "buffer" if you will so I can access these values elsewhere.
Sorry if I was unclear I want to essentially make this "abstract" for lack of a better word.
So that if I were someday to change the name of "Weight" to "HowHeavy" or if i were to add an additional element "NumberOfBranches" I would not have to hardcode the structure of the xml file.

Is this what you after ?
class Program
{
static void Main(string[] args)
{
string xml = #"<Woods><Wood><ID>1</ID><Name>Hickory</Name><Weight>3</Weight><Thickness>4</Thickness><Density>5</Density><Purity>6</Purity><Age>7</Age></Wood><Wood><ID>2</ID><Name>Soft Maple</Name><Weight>3</Weight><Thickness>4</Thickness><Density>5</Density><Purity>6</Purity><Age>7</Age></Wood><Wood><ID>3</ID><Name>Red Oak</Name><Weight>3</Weight><Thickness>4</Thickness><Density>5</Density><Purity>6</Purity><Age>7</Age></Wood></Woods>";
XDocument doc = XDocument.Parse(xml);
//Get your wood nodes and values in a list
List<Tuple<string,string>> list = doc.Descendants().Select(a=> new Tuple<string,string>(a.Name.LocalName,a.Value)).ToList();
// display the list
list.All(a => { Console.WriteLine(string.Format("Node name {0} , Node Value {1}", a.Item1, a.Item2)); return true; });
Console.Read();
}
}

You can use xmlDocument.SelectNodes("//child::node()")

Related

Relative references in an XML document

I am trying to pull out data from an XML document that seems to use relative references like this:
<action>
<topic reference="../../action[110]/topic"/>
<context reference="../../../../../../../../../../../../../contexts/items/context[2]"/>
</action>
Two questions:
Is this normal or common?
Is there a way to handle this with linq to XML / XDocument or would I need to manually traverse the document tree?
Edit:
To clarify, the references are to other nodes within the same XML document. The context node above references a list of contexts, and says to get the one at index 2.
The topic node worries me more because it's referencing a certain other action's topic, which could in turn reference a list of topics. If that wasn't happening I would have just loaded the lists of contexts and topics in a cache and looked them up that way.
You can use XPATH Query to extract the nodes and it is very efficient.
Step1: Load the XML into XMLDocument
Step2: use node.SelectNodes("//*[reference]")
Step3: After that you can loop through the XML nodes.
I ended up manually traversing the tree. But with extension methods it's all nice and out of the way. In case it might help anyone in the future, this is what I threw together for my use-case:
public static XElement GetRelativeNode(this XAttribute attribute)
{
return attribute.Parent.GetRelativeNode(attribute.Value);
}
public static string GetRelativeNode(this XElement node, string pathReference)
{
if (!pathReference.Contains("..")) return node; // Not relative reference
var parts = pathReference.Split(new string[] { "/"}, StringSplitOptions.RemoveEmptyEntries);
XElement current = node;
foreach (var part in parts)
{
if (string.IsNullOrEmpty(part)) continue;
if (part == "..")
{
current = current.Parent;
}
else
{
if (part.Contains("["))
{
var opening = part.IndexOf("[");
var targetNodeName = part.Substring(0, opening);
var ending = part.IndexOf("]");
var nodeIndex = int.Parse(part.Substring(opening + 1, ending - opening - 1));
current = current.Descendants(targetNodeName).Skip(nodeIndex-1).First();
}
else
{
current = current.Element(part);
}
}
}
return current;
}
And then you'd use it like this (item is an XElement):
item.Element("topic").Attribute("reference").GetRelativeNode().Value

XML How to select Child Elements using XPath

I've got the following XML, shown in the following image:
But I can't for the life of me, get any code to select the house element between <ArrayOfHouse>.
There will be more than one House element once I've managed to get it to select one, here's my code so far:
// Parse the data as an XML document
XDocument xmlHouseResults = XDocument.Parse(houseSearchResult);
// Select the House elements
XPathNavigator houseNavigator = xmlHouseResults.CreateNavigator();
XPathNodeIterator nodeIter = houseNavigator.Select("/ArrayOfHouse/House");
// Loop through the selected nodes
while (nodeIter.MoveNext())
{
// Show the House id, as taken from the XML document
MessageBox.Show(nodeIter.Current.SelectSingleNode("house_id").ToString());
}
I'm getting the stream of XML, because I have managed to show the data in the MessageBox shown above, but I can't get to the individual houses.
You can select the House nodes like this:
var houses = XDocument.Parse(houseSearchResult).Descendants("House");
foreach(var house in houses)
{
var id = house.Element("house_id");
var location = house.Element("location");
}
Or you can use Select to directly get a strongly typed object:
var houses = XDocument.Parse(houseSearchResult)
.Descendants("House")
.Select(x => new House
{
Id = x.Element("house_id"),
Location = x.Element("location")
});
This assumes that there exists a class House with the properties Id and Location.
Also, please be sure to think about the suggestion by Thomas Levesque to use XML serialization.
With XPath you would need to use an XmlNamespaceManager, however as you have an XDocument you could simply use the LINQ to XML axis methods e.g.
XNamespace df = XmlHouseResults.Root.Name.Namespace;
foreach (XElement house in XmlHouseResults.Descendants("df" + "House"))
{
MessageBox.Show((string)house.Element("df" + "house_id"));
}

Singlethreaded application shows race condition like behaviour

I have a big (~40mb) collection of XML data, split in many files which are not well formed, so i merge them, add a root node and load all the xml in a XmlDocument. Its basically a list of 3 different types which can be nested in a few different ways. This example should show most of the cases:
<Root>
<A>
<A>
<A></A>
<A></A>
</A>
</A>
<A />
<B>
<A>
<A>
<A></A>
<A></A>
</A>
</A>
</B>
<C />
</Root>
Im separating all A, B and C nodes by using XPath expressions on a XmlDocument (//A, //B, //C), convert the resulting nodesets to a datatable and show a list of all nodes of each nodetype separately in a Datagridview. This works fine.
But now Im facing an even bigger file and as soon as i load it, it shows me only 4 rows. Then i added a breakpoint at the line where the actual XmlDocument.SelectNodes happens and checked the resulting NodeSet. It shows me about 25,000 entries. After continuing the program loaded and whoops, all my 25k rows were shown. I tried it again and i can reproduce it. If i step over XmlDocument.SelectNodes by hand, it works. If i dont break there, it does not. Im not spawning a single thread in my application.
How can i debug this any further? What to look for? I have experienced such behaviour with multithreaded libraries such as jsch (ssh) but im dont see why this should happen in my case.
Thank you very much!
// class XmlToDataTable:
private DataTable CreateTable(NamedXPath logType,
List<XmlColumn> columns,
ITableCreator tableCreator)
{
// I have to break here -->
XmlNodeList xmlNodeList = logFile.GetEntries(logType);
// <-- I have to break here
DataTable dataTable = tableCreator.CreateTableLayout(columns);
foreach (XmlNode xmlNode in xmlNodeList)
{
DataRow row = dataTable.NewRow();
tableCreator.PopulateRow(xmlNode, row, columns);
dataTable.Rows.Add(row);
}
return dataTable;
}
// class Logfile:
public XmlNodeList GetEntries(NamedXPath e)
{
return (_xmlDocument != null && _xmlDocument.HasChildNodes)
? _xmlDocument.SelectNodes(e.XPath)
: new XmlNullObjectNodeList();
}
// _xmlDocument gets loaded here after reading all xml fragments into a string
// (ugly, i know. the // ugly! comment reminds me about that ;))
private void CreateXmlDoc()
{
_xmlDocument = new XmlDocument();
_xmlDocument.LoadXml(OPEN_ROOT_ELEMENT + _xmlString +
CLOSE_ROOT_ELEMENT);
if (DataChanged != null)
DataChanged(this, new EventArgs());
}
// class NamedXPath:
public abstract class NamedXPath
{
private readonly String _name;
private readonly String _xPath;
protected NamedXPath(string name, string xPath)
{
_name = name;
_xPath = xPath;
}
public string Name
{
get { return _name; }
}
public string XPath
{
get { return _xPath; }
}
}
Instead of using XPath directly in the code first, I would use a tool such as sketchPath to get my XPath right. You can either load your original XML or use subset of original XML.
Play with XPath and your XML to see if the expected nodes are getting selected before using xpath in your code.
Okay, solved it. tableCreator is part of my strategy pattern, which influences the way the table is built. In a certain implementation I do something like this:
XmlNode xn = xmlDocument.SelectSingleNode(fancyXPath);
// if a node has ancestors, then its a linked list:
// <a><a><a></a></a></a>
if(xn.SelectSingleNode("a") != null)
xn.SelectSingleNode("a").InnerText = "<IDs of linked list items CSV like here>";
Which means im replacing parts of a xml linked list with some text and lose the nested items there.
Wouldn't be a problem to find this bug if this change wouldn't affect the original XmlDocument. Even then, debugging it should not be too hard. What makes my program behaving differently depending whether I break or not seems to be the following:
Return Value:
The first XmlNode that
matches the XPath query or null if no
matching node is found. The XmlNode
should not be expected to be connected
"live" to the XML document. That is,
changes that appear in the XML
document may not appear in the
XmlNode, and vice versa. (API
Description of XmlNode.SelectNodes())
If I break there, the changes are written back to the original XmlDocument, if I don't break, its not written back. Can't really explain that to myself, but without the change in the XmlNode everything works.
edit:
Now im quite sure: I had XmlNodeList.Count in my watches. This means, everytime i debugged, VS called the property Count, which not only returns a number but calls ReadUntil(int), which refreshes the internal list:
internal int ReadUntil(int index)
{
int count = this.list.Count;
while (!this.done && (count <= index))
{
if (this.nodeIterator.MoveNext())
{
XmlNode item = this.GetNode(this.nodeIterator.Current);
if (item != null)
{
this.list.Add(item);
count++;
}
}
else
{
this.done = true;
return count;
}
}
return count;
}
This may have caused that weird behavior.

How to read XML attributes in C#?

I have several XML files that I wish to read attributes from. My main objective is to apply syntax highlighting to rich text box.
For example in one of my XML docs I have: <Keyword name="using">[..] All the files have the same element: Keyword.
So, how can I get the value for the attribute name and put them in a collection of strings for each XML file.
I am using Visual C# 2008.
The other answers will do the job - but the syntax highlighting thingy and the several xml files you say you have makes me thinks you need something faster, why not use a lean and mean XmlReader?
private string[] getNames(string fileName)
{
XmlReader xmlReader = XmlReader.Create(fileName);
List<string> names = new List<string>();
while (xmlReader.Read())
{
//keep reading until we see your element
if (xmlReader.Name.Equals("Keyword") && (xmlReader.NodeType == XmlNodeType.Element))
{
// get attribute from the Xml element here
string name = xmlReader.GetAttribute("name");
// --> now **add to collection** - or whatever
names.Add(name);
}
}
return names.ToArray();
}
Another good option would be the XPathNavigator class - which is faster than XmlDoc and you can use XPath.
Also I would suggest to go with this approach only IFF after you try with the straightforward options you're not happy with performance.
You could use XPath to get all the elements, then a LINQ query to get the values on all the name atttributes you find:
XDocument doc = yourDocument;
var nodes = from element in doc.XPathSelectElements("//Keyword")
let att = element.Attribute("name")
where att != null
select att.Value;
string[] names = nodes.ToArray();
The //Keyword XPath expression means, "all elements in the document, named "Keyword".
Edit: Just saw that you only want elements named Keyword. Updated the code sample.
Like others, I would suggest using LINQ to XML - but I don't think there's much need to use XPath here. Here's a simple method to return all the keyword names within a file:
static IEnumerable<string> GetKeywordNames(string file)
{
return XDocument.Load(file)
.Descendants("Keyword")
.Attributes("name")
.Select(attr => attr.Value);
}
Nice and declarative :)
Note that if you're going to want to use the result more than once, you should call ToList() or ToArray() on it, otherwise it'll reload the file each time. Of course you could change the method to return List<string> or string[] by -adding the relevant call to the end of the chain of method calls, e.g.
static List<string> GetKeywordNames(string file)
{
return XDocument.Load(file)
.Descendants("Keyword")
.Attributes("name")
.Select(attr => attr.Value)
.ToList();
}
Also note that this just gives you the names - I would have expected you to want the other details of the elements, in which case you'd probably want something slightly different. If it turns out you need more, please let us know.
You could use LINQ to XML.
Example:
var xmlFile = XDocument.Load(someFile);
var query = from item in xmlFile.Descendants("childobject")
where !String.IsNullOrEmpty(item.Attribute("using")
select new
{
AttributeValue = item.Attribute("using").Value
};
You'll likely want to use XPath. //Keyword/#name should get you all of the keyword names.
Here's a good introduction: .Net and XML XPath Queries
**<Countries>
<Country name ="ANDORRA">
<state>Andorra (general)</state>
<state>Andorra</state>
</Country>
<Country name ="United Arab Emirates">
<state>Abu Z¸aby</state>
<state>Umm al Qaywayn</state>
</Country>**
public void datass(string file)
{
string file = HttpContext.Current.Server.MapPath("~/App_Data/CS.xml");
XmlDocument doc = new XmlDocument();
if (System.IO.File.Exists(file))
{
//Load the XML File
doc.Load(file);
}
//Get the root element
XmlElement root = doc.DocumentElement;
XmlNodeList subroot = root.SelectNodes("Country");
for (int i = 0; i < subroot.Count; i++)
{
XmlNode elem = subroot.Item(i);
string attrVal = elem.Attributes["name"].Value;
Response.Write(attrVal);
XmlNodeList sub = elem.SelectNodes("state");
for (int j = 0; j < sub.Count; j++)
{
XmlNode elem1 = sub.Item(j);
Response.Write(elem1.InnerText);
}
}
}

C# how can I get all elements name from a xml file

I'd like to get all the element name from a xml file, for example the xml file is,
<BookStore>
<BookStoreInfo>
<Address />
<Tel />
<Fax />
<BookStoreInfo>
<Book>
<BookName />
<ISBN />
<PublishDate />
</Book>
<Book>
....
</Book>
</BookStore>
I would like to get the element's name of "BookName". "ISBN" and "PublishDate " and only those names, not include " BookStoreInfo" and its child node's name
I tried several ways, but doesn't work, how can I do it?
Well, with XDocument and LINQ-to-XML:
foreach(var name in doc.Root.DescendantNodes().OfType<XElement>()
.Select(x => x.Name).Distinct())
{
Console.WriteLine(name);
}
There are lots of similar routes, though.
Using XPath
XmlDocument xdoc = new XmlDocument();
xdoc.Load(something);
XmlNodeList list = xdoc.SelectNodes("//BookStore");
gives you a list with all nodes in the document named BookStore
I agree with Adam, the ideal condition is to have a schema that defines the content of xml document. However, sometimes this is not possible. Here is a simple method for iterating all of the nodes of an xml document and using a dictionary to store the unique local names. I like to keep track of the depth of each local name, so I use a list of int to store the depth. Note that the XmlReader is "easy on the memory" since it does not load the entire document as the XmlDocument does. In some instances it makes little difference because the size of the xml data is small. In the following example, an 18.5MB file is read with an XmlReader. Using an XmlDocument to load this data would have been less effecient than using an XmlReader to read and sample its contents.
string documentPath = #"C:\Docs\cim_schema_2.18.1-Final-XMLAll\all_classes.xml";
Dictionary<string, List<int>> nodeTable = new Dictionary<string, List<int>>();
using (XmlReader reader = XmlReader.Create(documentPath))
{
while (!reader.EOF)
{
if (reader.NodeType == XmlNodeType.Element)
{
if (!nodeTable.ContainsKey(reader.LocalName))
{
nodeTable.Add(reader.LocalName, new List<int>(new int[] { reader.Depth }));
}
else if (!nodeTable[reader.LocalName].Contains(reader.Depth))
{
nodeTable[reader.LocalName].Add(reader.Depth);
}
}
reader.Read();
}
}
Console.WriteLine("The node table has {0} items.",nodeTable.Count);
foreach (KeyValuePair<string, List<int>> kv in nodeTable)
{
Console.WriteLine("{0} [{1}]",kv.Key, kv.Value.Count);
for (int i = 0; i < kv.Value.Count; i++)
{
if (i < kv.Value.Count-1)
{
Console.Write("{0}, ", kv.Value[i]);
}
else
{
Console.WriteLine(kv.Value[i]);
}
}
}
The purists way of doing this (and, to be fair, the right way) would be to have a schema contract definition and read it in that way. That being said, you could do something like this...
List<string> nodeNames = new List<string>();
foreach(System.Xml.XmlNode node in doc.SelectNodes("BookStore/Book"))
{
foreach(System.Xml.XmlNode child in node.Children)
{
if(!nodeNames.Contains(child.Name)) nodeNames.Add(child.Name);
}
}
This is, admittedly, a rudimentary method for obtaining the list of distinct node names for the Book node's children, but you didn't specify much else in the way of your environment (if you have 3.5, you could use LINQ to XML to make this a little prettier, for example), but this should get the job done regardless of your environment.
If you're using C# 3.0, you can do the following:
var data = XElement.Load("c:/test.xml"); // change this to reflect location of your xml file
var allElementNames =
(from e in in data.Descendants()
select e.Name).Distinct();
You can try doing it using XPATH.
XmlDocument doc = new XmlDocument();
doc.LoadXml("xml string");
XmlNodeList list = doc.SelectNodes("//BookStore/Book");
If BookStore is ur root element then u can try following code
XmlDocument doc = new XmlDocument();
doc.Load(configPath);
XmlNodeList list = doc.DocumentElement.GetElementsByTagName("Book");
if (list.Count != 0)
{
for (int i = 0; i < list[0].ChildNodes.Count; i++)
{
XmlNode child = list[0].ChildNodes[i];
}
}

Categories