Issue parsing XML document in c# - c#

I am trying to get the innertext from specific elements in an XML document, passed into via a string and I can't work out why it's not finding any nodes.
This code runs fine, but never enters either of the FOREACH loops as the ocNodesCompany and ocNodesOrgs both have xero elements. Why does the GetElementsByTagName not find the nodes?
BTW I've also tried:
XmlNodeList ocNodesOrgs = thisXmlDoc.SelectNodes("//OpenCalaisSimple/CalaisSimpleOutputFormat/Company")
Code:
public static ArrayList getTwitterHandles(String ocXML)
{
ArrayList thisList = new ArrayList();
XmlDocument thisXmlDoc = new XmlDocument();
thisXmlDoc.LoadXml(ocXML);
//get Companies
XmlNodeList ocNodesCompany = thisXmlDoc.GetElementsByTagName("Company");
foreach (XmlElement element in ocNodesCompany)
{
thisList.Add(element.InnerText);
}
//Get Organisations
XmlNodeList ocNodesOrgs = thisXmlDoc.GetElementsByTagName("Organization");
foreach (XmlElement element in ocNodesOrgs)
{
thisList.Add(element.InnerText);
}
//Get Organisations
return thisList;
}
My XML String is:
<!--Use of the Calais Web Service is governed by the Terms of Service located at http://www.opencalais.com. By using this service or the results of the service you agree to these terms of service.--><!-- Company: BBC,T-mobile,Vodafone,GE, IndustryTerm: open calais services, Organization: Federal Bureau of Investigation,Red Cross,Greenpeace,Royal Navy,-->
<OpenCalaisSimple>
<Description>
<calaisRequestID>38cb8898-48ba-85ff-12e9-f8d629568428</calaisRequestID>
<id>http://id.opencalais.com/lt0Hf8XWIr2DNIJzNlaXlA</id>
<about>http://d.opencalais.com/dochash-1/ff929eb2-de43-3ed1-8ee4-6109abf6bf77</about>
<docTitle/>
<docDate>2011-03-10 06:36:08.646</docDate>
<externalMetadata/>
</Description>
<CalaisSimpleOutputFormat>
<Company count="1" relevance="0.603" normalized="British Broadcasting Corporation">BBC</Company>
<Company count="1" relevance="0.603" normalized="T-MOBILE NETHERLANDS HOLDING B.V.">T-mobile</Company>
<Company count="1" relevance="0.603" normalized="Vodafone Group Plc">Vodafone</Company>
<Company count="1" relevance="0.603" normalized="General Electric Company">GE</Company>
<IndustryTerm count="1" relevance="0.603">open calais services</IndustryTerm>
<Organization count="1" relevance="0.603">Red Cross</Organization>
<Organization count="1" relevance="0.603">Greenpeace</Organization>
<Organization count="1" relevance="0.603">Royal Navy</Organization>
<Topics>
<Topic Taxonomy="Calais" Score="0.899">Human Interest</Topic>
<Topic Taxonomy="Calais" Score="0.694">Technology_Internet</Topic>
</Topics>
</CalaisSimpleOutputFormat>
</OpenCalaisSimple>

Note that Microsoft recommend you use XPath also, here is their help page for the GetElementsByTag method, and note the comment towards the middle recommending the use of SelectNodes instead (which is XPath).
http://msdn.microsoft.com/en-us/library/dc0c9ekk.aspx
A variation of your method, written with XPath, would be:
public static ArrayList getTwitterHandles(String ocXML)
{
ArrayList thisList = new ArrayList();
XmlDocument thisXmlDoc = new XmlDocument();
thisXmlDoc.LoadXml(ocXML);
//get Companies
XmlNodeList ocNodesCompany = thisXmlDoc.SelectNodes("//Company");
foreach (XmlElement element in ocNodesCompany)
{
thisList.Add(element.InnerText);
}
//Get Organisations
XmlNodeList ocNodesOrgs = thisXmlDoc.SelectNodes("//Organization");
foreach (XmlElement element in ocNodesOrgs)
{
thisList.Add(element.InnerText);
}
//Get Organisations
return thisList;
}
Note that the above implements what I believe is the functionality you have in your example - which is not quite the same as the xpath you've tried. Essentially in XPath "//" means any parent nodes, so "//Company" will pick up ANY subnode of the root you pass in that has a name of Company.
If you only want specific Company nodes, then you can be more specific:
XmlNodeList ocNodesCompany = thisXmlDoc.SelectNodes("//Company");
becomes
XmlNodeList ocNodesCompany = thisXmlDoc.SelectNodes("/OpenCalaisSimple/CalaisSimpleOutputFormat/Company");
Note the key difference is that there is only ONE forward slash at the beginning.
I've just tested both variations and they work great.
If you're handling XML files then I would strongly recommend you read up on, and become a guru, of XPath, it's exceptionally handy for allowing you to rapidly write code to parse through XML files and pick out precisely what you need (though I should add it's not the only way to do it and it is certainly not appropriate for all circumstances of course :) )
Hope this helps.

Seems like you should use XPath query to get elements you wanna recieve. You can read about it here

You could also use XDocument from System.Xml.Linq namespace. The following snippet is almost equivalent to your code. The return type is List<string> instead of ArrayList.
public static List<string> getTwitterHandles(String ocXml)
var xml = XDocument.Parse(ocXml);
var list = xml.Descendants("Company")
.Concat(xml.Descendants("Organization"))
.Select(element => element.Value)
.ToList();
return list;
}

Related

How can I find a specific XML element programmatically?

I have this chunk of XML
<EnvelopeStatus>
<CustomFields>
<CustomField>
<Name>Matter ID</Name>
<Show>True</Show>
<Required>True</Required>
<Value>3</Value>
</CustomField>
<CustomField>
<Name>AccountId</Name>
<Show>false</Show>
<Required>false</Required>
<Value>10804813</Value>
<CustomFieldType>Text</CustomFieldType>
</CustomField>
I have this code below:
// TODO find these programmatically rather than a strict path.
var accountId = envelopeStatus.SelectSingleNode("./a:CustomFields", mgr).ChildNodes[1].ChildNodes[3].InnerText;
var matterId = envelopeStatus.SelectSingleNode("./a:CustomFields", mgr).ChildNodes[0].ChildNodes[3].InnerText;
The problem is, sometimes the CustomField with 'Matter ID' might not be there. So I need a way to find the element based on what 'Name is', i.e. a programmatic way of finding it. I can't rely on indexes being accurate.
You can use this code to read innertext from a specific element:
XmlDocument doc = new XmlDocument();
doc.Load("your.xml");
XmlNodeList Nodes= doc.SelectNodes("/EnvelopeStatus/CustomField");
if (((Nodes!= null) && (Nodes.Count > 0)))
{
foreach (XmlNode Level1 in Nodes)
{
if (Level1.ChildNodes[1].Name == "name")
{
string text = Convert.ToInt32(Level1.ChildNodes[1].InnerText.ToString());
}
}
}
You can often find anything in a XML document by utilizing the XPath capabilities that is available directly in the .NET Framework versions.
Maybe create a small XPath parser helper class
public class EnvelopeStatusParser
{
public XmlNodeList GetNodesWithName(XmlDocument doc, string name)
{
return doc.SelectNodes($"//CustomField[Name[text()='{name}']]");
}
}
and then call it like below to get all CustomFields which have a Name that equals what you need to search for
// Creating the XML Document in some form - here reading from file
XmlDocument doc = new XmlDocument();
doc.Load(#"envelopestatus.xml");
var parser = new EnvelopeStatusParser();
var matchingNodes = parser.GetNodesWithName(doc, "Matter ID");
Console.WriteLine(matchingNodes);
matchingNodes = parser.GetNodesWithName(doc, "NotHere");
Console.WriteLine(matchingNodes);
There exist numerous XPath cheat sheets around - like this one from LaCoupa - xpath-cheatsheet which can be quiet helpful to fully utilize XPath on XML structures.

How to get nodes from XML file without its attribute and put to a List of string

I would like to display the tag names of child nodes without its attributes. Then those tag names (nodes) should be put in a List of string. Here's example of my XML file:
<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<CAR>
<ID>21</ID>
<MANUFACTURER>Ford</MANUFACTURER>
<MODEL>Fiesta</MODEL>
</CAR>
<CAR>
<ID>22</ID>
<MANUFACTURER>Peugeot</MANUFACTURER>
<MODEL>508</MODEL>
</CAR>
</ROOT>
So, the effect I want to get in a console output is shown below:
ID
MANUFACTURER
MODEL
Then I would like to store that ID, MANUFACTURER and MODEL tag names in a List of strings.
This is the code that I tried so far:
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.PreserveWhitespace = true;
try
{
xmlDocument.Load("XMLFile.xml");
}
catch (FileNotFoundException ex)
{
Console.WriteLine(ex);
}
Console.WriteLine(xmlDocument.OuterXml);
XmlNodeList nodeList = xmlDocument.SelectNodes("ROOT/CAR");
foreach(XmlNode node in nodeList)
{
Console.WriteLine(node.ChildNodes);
xmlNodes.Add(node.ChildNodes.ToString());
}
The problem is that it's not displaying the way I want to. As a result I only get two System.Xml.XmlChildNodes which seems to be corresponding to two <CAR> nodes, instead of its three child nodes, such as ID, MANUFACTURER and MODEL.
System.Xml.XmlChildNodes
System.Xml.XmlChildNodes
Adding items to a List basically adds the same thing as shown above.
What am I doing wrong?
If you have to use XmlDocument, then you can -
List<string> elements = new List<string>();
XmlNodeList CarNodes = xml.SelectNodes("Root/Car");
foreach(XmlNode c in CarNodes)
{
foreach(XmlNode n in c.ChildNodes)
{
if (!elements.Contains(n.Name))
{
elements.Add(n.Name);
}
}
}
But I find XDocument to be much simpler and better readability.
XDocument xdoc = XDocument.Parse(yourXmlString);
List<string> elements = xdoc.Descendants("Car")
.DescendantNodes().OfType<XElement>()
.Select(x => x.Name).Distinct().ToList();
And thats all you'll need. Easy to read as well, get all the descendants of "Car" Node and get all distinct names of XElements within it.
Another way to do it -
List<string> elements = xdoc.Descendants("Car").First()
.DescendantNodes().OfType<XElement>()
.Select(x => x.Name).ToList();
In this case I have removed the "distinct" and rather got just the first Car node ONLY. You can see the difference - if by any case some other Car node has an extra element, you'll miss getting that information by doing it this way.
You could loop through for children nodes:
1- You can define xmlNodes like a HashSet to avoid multiple tags like :
HashSet<string> xmlNodes = new HashSet<string>();
2 - Change little the code like :
....
XmlNodeList nodeList = xmlDocument.SelectNodes("ROOT/CAR");
foreach (XmlNode node in nodeList)
{
foreach(XmlNode element in node.ChildNodes)
{
if (element.NodeType == XmlNodeType.Element)
xmlNodes.Add(element.Name);
}
}
Demo
Console.WriteLine(string.Join(", ", xmlNodes));
Result
ID, MANUFACTURER, MODEL
I hope you find this helpful.

Confused in getting nested elements and values via XmlDocument

I am trying to access quite a deep XML file, yet I get confused and just brute forced access to an element to get the elements within it, My problem is, it does not work (Object reference not set to an instance of an object) and my method of getting the "dictory" of the XML looks very inefficient,
My goal is to grab all the <ramStick> via a foreach loop since it can have 1-* ramSticks
Here is part of my C# code that I am having an issue with:
XmlDocument doc = new XmlDocument();
string xmlFilePath = #"C:\xampp\htdocs\userInfo.xml";
doc.Load(xmlFilePath);
XmlNodeList accountList = doc.GetElementsByTagName("account");
foreach (XmlNode node in accountList)
{
XmlElement accountElement = (XmlElement)node;
// I got inside this loop at this point and can get an accountElement with values
String hostname = accountElement.GetElementsByTagName("user")[0].InnerText;
// This is where I get confused and don't know what I am really doing and I just experiment
XmlNode accountRoot = accountElement.GetElementsByTagName("systemInfo")[0];
XmlElement ramNode = (XmlElement)accountRoot;
XmlNode ramInfo = ramNode.GetElementsByTagName("ramInfo")[0];
XmlElement ramList = (XmlElement)ramInfo;
XmlNodeList ramStick = ramList.GetElementsByTagName("ramStick");
// I want to run a foreach loop on each ramStick to get the values
foreach (XmlNode ramNodeForLoop in ramStick)
{
XmlElement ramData = (XmlElement)ramNodeForLoop;
String partNumber = ramData.GetElementsByTagName("partNumber")[0].InnerText;
}
}`
I have quite a big XML file which contains multiple <account> here is a sample of it:
<account>
<user>OYSTER-PC</user>
<systemInfo>
<dskInfo>
<dskInterface>
<deviceID>C:</deviceID><description>Local Fixed Disk</description><size>500000878592</size><freeSpace>377396776960</freeSpace><fileSystem>NTFS</fileSystem><volumeSerialNumber>4C922158</volumeSerialNumber>
</dskInterface>
<dskInterface>
<deviceID>D:</deviceID><description>CD-ROM Disc</description><size/><freeSpace/><fileSystem/><volumeSerialNumber/>
</dskInterface>
<dskInterface>
<deviceID>E:</deviceID><description>CD-ROM Disc</description><size/><freeSpace/><fileSystem/><volumeSerialNumber/>
</dskInterface>
</dskInfo>
<hddInfo>
<hddInterface>
<model>ST9500325AS ATA Device</model><interfaceType>IDE</interfaceType><name>\\.\PHYSICALDRIVE0</name><partitions>1</partitions><serialNumber>2020202020202020202020205636384547535146</serialNumber><status>OK</status>
</hddInterface>
</hddInfo>
<nicInfo>
<nicInterface>
<macAddress>48:5D:60:03:88:04</macAddress><description>Atheros AR9285 Wireless Network Adapter</description><ipAddress>192.168.1.10</ipAddress><ipSubnet>255.255.255.0</ipSubnet><defaultIpGateway>192.168.1.1</defaultIpGateway><dhcpServer>192.168.1.1</dhcpServer>
</nicInterface>
<nicInterface>
<macAddress>74:F0:6D:A8:4E:32</macAddress><description>Bluetooth Device (Personal Area Network)</description><ipAddress/><ipSubnet/><defaultIpGateway/><dhcpServer/></nicInterface>
<nicInterface>
<macAddress>20:41:53:59:4E:FF</macAddress><description>RAS Async Adapter</description><ipAddress/><ipSubnet/><defaultIpGateway/><dhcpServer/>
</nicInterface>
<nicInterface>
<macAddress>20:CF:30:55:0C:EF</macAddress><description>JMicron PCI Express Gigabit Ethernet Adapter</description><ipAddress/><ipSubnet/><defaultIpGateway/><dhcpServer/>
</nicInterface>
</nicInfo>
<ramInfo>
<ramStick>
<partNumber>M471B5273DH0-CK0 </partNumber><serialNumber>E2CE33AF</serialNumber><capacity>4294967296</capacity>
</ramStick>
<ramStick>
<partNumber>M471B5273DH0-CK0 </partNumber><serialNumber>630155D0</serialNumber><capacity>4294967296</capacity>
</ramStick>
</ramInfo>
</systemInfo>
</account>
You can use xpath like this:
foreach (XmlElement element in doc.SelectNodes("//account/systemInfo/ramInfo/ramStick"))
{
string partNumber = element["partNumber"].InnerText;
}
Note that since this call to SelectNodes returns only XmlElement, I can use XmlElement as type for the loop variable. Otherwise I'd have to use XmlNode.
After recoding so much,
Here is what I finally did...
XmlNode systemInfo = node.SelectSingleNode("systemInfo");
XmlNode ramInfo = systemInfo.SelectSingleNode("ramInfo");
XmlNodeList ramList = ramInfo.SelectNodes("ramStick");
foreach (XmlElement ramStick in ramList)
{
// add code here
}

How to find particular node in XML and all of its child nodes?

This is my XML:
<?xml version="1.0"?>
<formatlist>
<format>
<formatName>WHC format</formatName>
<delCol>ID</delCol>
<delCol>CDRID</delCol>
<delCol>TGIN</delCol>
<delCol>IPIn</delCol>
<delCol>TGOUT</delCol>
<delCol>IPOut</delCol>
<srcNum>SRCNum</srcNum>
<distNum>DSTNum</distNum>
<connectTime>ConnectTime</connectTime>
<duration>Duration</duration>
</format>
<format>
<formatName existCombineCol="1">Umobile format</formatName> //this format
<delCol>billing_operator</delCol>
<hideCol>event_start_date</hideCol>
<hideCol>event_start_time</hideCol>
<afCombineName dateType="DateTime" format="dd/MM/yyyy HH:mm:ss"> //node i want
<name>ConnectdateTimeAFcombine</name>
<combineDate>event_start_date</combineDate>
<combineTime>event_start_time</combineTime>
</afCombineName>
<afCombineName dateType="DateTime" format="dd/MM/yyyy HH:mm:ss"> //node i want
<name>aaa</name>
<combineDate>bbb</combineDate>
<combineTime>ccc</combineTime>
</afCombineName>
<modifyPerfixCol action="add" perfix="60">bnum</modifyPerfixCol>
<srcNum>anum</srcNum>
<distNum>bnum</distNum>
<connectTime>ConnectdateTimeAFcombine</connectTime>
<duration>event_duration</duration>
</format>
</formatlist>
I want to find format with Umobile format then iterate over those two nodes.
<afCombineName dateType="DateTime" format="dd/MM/yyyy HH:mm:ss"> //node i want
<name>ConnectdateTimeAFcombine</name>
<combineDate>event_start_date</combineDate>
<combineTime>event_start_time</combineTime>
</afCombineName>
<afCombineName dateType="DateTime" format="dd/MM/yyyy HH:mm:ss"> //node i want
<name>aaa</name>
<combineDate>bbb</combineDate>
<combineTime>ccc</combineTime>
</afCombineName>
and list all the two node's child nodes. The result should like this:
ConnectdateTimeAFcombine,event_start_date,event_start_time.
aaa,bbb,ccc
How can I do this?
foreach(var children in format.Descendants())
{
//Do something with the child nodes of format.
}
For all XML related traversing, you should get used to using XPath expressions. It is very useful. Even if you could perhaps do something easier in your specific case, it is good practice to use XPath. This way, if your scheme changes at some point, you just update your XPath expression and your code will be up and running.
For a complete example, you can have a look at this article.
You can use the System.Xml namespace APIs along with System.Xml.XPath namespace API. Here is a quick algorithm that will help you do your task:
Fetch the text node containing the string Umobile format using the below XPATH:
XmlNode umobileFormatNameNode = document.SelectSingleNode("//formatName[text()='Umobile format']");
Now the parent of umobileFormatNameNode will be the node that you are interested in:
XmlNode formatNode = umobileFormatNameNode.ParentNode;
Now get the children for this node:
XmlNodeList afCombineFormatNodes = formatNode.SelectNodes("afCombineName");
You can now process the list of afCombineFormatNodes
for(XmlNode xmlNode in afCombineNameFormtNodes)
{
//process nodes
}
This way you can access those elements:
var doc = System.Xml.Linq.XDocument.Load("PATH TO YOUR XML FILE");
var result = doc.Descendants("format")
.Where(x => (string)x.Element("formatName") == "Umobile format")
.Select(x => x.Element("afCombineName"));
Then you can iterate the result this way:
foreach (var item in result)
{
string format = item.Attribute("format").Value.ToString();
string name = item.Element("name").Value.ToString();
string combineDate = item.Element("combineDate").Value.ToString();
string combineTime = item.Element("combineTime").Value.ToString();
}

Xdocument, get each element(s) value

I have xml as follows:
<Reports>
<report>
<name>By Book</name>
<report_type>book</report_type>
<Object>Count Change</Object>
<Slicers detail="detail">
<Namespace>EOD</Namespace>
<BookNode>HighLevel</BookNode>
<DateFrom>T-2</DateFrom>
<DateTo>T-1</DateTo>
<System>NewSystem</System>
</Slicers>
</report>
</Reports>
I simply want to loop through the value of each element of the Xdocument (pref would be any element under Slicers) but to start with just all elements.
When I run the following:
var slicers = from c in config.Elements("Reports")
select c.Value ;
foreach (var xe in slicers)
{
Console.WriteLine(xe);
}
The output is a single line concatenating all the values together.
"By BookbookCount ChangeEODHighLevelT-2T-1NewSystem"
I want to loop through them one at a time, 'By Book' first, run some code then book etc etc.
I am sure this is simple, but cant get round it. I have tried foreach(Xelement in query) but same resulst
i would do it something like this;
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
//load in your xml here
XmlNodeList xnList = doc.SelectNodes("nodeYou'reLookingFor");
//for getting just the splicers you could do "Reports/report/Slicers"
foreach (XmlNode node in xnList)
string namespace = node["Namespace"].InnerText;
//go through all your nodes here
you're creating a xmldoc, loading your xml into it, creating a list which holds each node in the list (at a specified Xpath), and then looping through each. in the loop you can do whatever you want by referencing
node["nodenamehere"].InnerText

Categories