I have a list of XML files that I need to extract 3 values from each file.
The XML looks somewhat like :
<ClinicalDocument xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" moodCode="EVN" xmlns="urn:hl7-org:v3">
<title>Summary</title>
<recordTarget>
<patientRole>
<patient>
<name>
<given>John</given>
<given>S</given>
<family>Doe</family>
</name>
<birthTime value="19480503" />
I'm trying to extract given name, family name and birth time.
Initially I'm trying to print out the values using:
XmlDocument doc2 = new XmlDocument();
doc2.Load(#"Z:\\DATA\\file.XML");
XmlElement root = doc2.DocumentElement;
XmlNodeList list = root.GetElementsByTagName("name");
for (int i = 0; i < list.Count; i++)
{
Console.WriteLine(list.Item(i).Value);
}
I'm not getting any value printed, but when I debug and check the inner values of "list" I can see what I need from that tag.
How can I extract the needed information?
Your code and all other answers ignore the default namespace xmlns="urn:hl7-org:v3"
I find Linq2Xml easier to use, so I'll post an answer using it..
var xDoc = XDocument.Load(filename);
var #namespace = "urn:hl7-org:v3";
XmlNamespaceManager namespaceManager = new XmlNamespaceManager(xDoc.CreateNavigator().NameTable);
namespaceManager.AddNamespace("ns", #namespace);
XNamespace ns = #namespace;
var names = xDoc.XPathSelectElements("//ns:patient/ns:name", namespaceManager).ToList();
var list = names.Select(p => new
{
Given = string.Join(", ", p.Elements(ns + "given").Select(x => (string)x)),
Family = (string)p.Element(ns + "family"),
BirthTime = new DateTime(1970,1,1).AddSeconds( (int)p.Parent.Element(ns + "birthTime").Attribute("value"))
})
.ToList();
Try this instead:
XmlDocument doc2 = new XmlDocument();
doc2.Load(#"Path\To\XmlFile.xml");
XmlElement root = doc2.DocumentElement;
XmlNodeList list = root.GetElementsByTagName("name");
var names = list[0].ChildNodes;
for (int i = 0; i < names.Count; i++)
{
Console.WriteLine(names[i].InnerText);
}
Output:
John
S
Doe
There are 2 issues with your code:
The first being that you were iterating around the name element, which only has a Count of 1 (as there is only one of these). That's why I included list[0],ChildNodes, to get all the children of the name element (given, given and family).
To retrieve the text inside each element, ("John", "S", "Doe"), you should use InnerText instead of Value
It's not clear from your example XML if there is only ever one <name> element or if there could be multiple. The following assumes there might be multiple. It also grabs the birthdate.
for (int i = 0; i < list.Count; i++)
{
var xmlNode = list.Item(i).FirstChild;
while (xmlNode != null)
{
Console.WriteLine(xmlNode.InnerText);
xmlNode = xmlNode.NextSibling;
}
}
XmlNodeList birthDates = root.GetElementsByTagName("birthTime");
for (int i = 0; i < list.Count; i++)
{
Console.WriteLine(birthDates[i].Attributes["value"].Value);
}
If there are multiple <patient> elements in your xml you could do:
using System;
using System.Xml;
using System.Xml.Linq;
using System.Xml.XPath;
class Program
{
static void Main()
{
var doc = XDocument.Load("a.xml");
var nsm = new XmlNamespaceManager(new NameTable());
nsm.AddNamespace("x", "urn:hl7-org:v3");
var patients = doc.XPathSelectElements("//x:patient", nsm);
foreach (var patient in patients)
{
Console.WriteLine(patient.XPathSelectElement("./x:name/x:given[1]", nsm).Value);
Console.WriteLine(patient.XPathSelectElement("./x:name/x:given[2]", nsm).Value);
Console.WriteLine(patient.XPathSelectElement("./x:name/x:family", nsm).Value);
Console.WriteLine(patient.XPathSelectElement("./x:birthTime", nsm).Attribute("value").Value);
}
}
}
Why do you need to add the name space explicitly even if it's a default name space in the xml? see: this answer
Related
I'm trying to get all the list of the different child nodes (not starting from root) of a loaded XML into a list of strings, I had done using System.Xml library but I want to write the same code with LINQ to XML too.
I had found a code that helped me a lot but it starts from Root, here is the code:
List<string> nodesNames = new List<string>();
XDocument xdoc1 = XDocument.Load(destinationPath);
XElement root = xdoc1.Document.Root;
foreach (var name in root.DescendantNodes().OfType<XElement>()
.Select(x => x.Name).Distinct())
{
if (!nodesNames.Contains(name.ToString()))
nodesNames.Add(name.ToString());
}
With this, I get the list of all child nodes + the parent node too, which I don't want to use.FirstChild or to delete manually from the list because I want a TOTALLY dynamic code and I have in input the parent node passed by the user.
For a better comprehension, this is the code that working for me but with System.Xml:
List<string> nodesNames = new List<string>();
XmlDocument doc = new XmlDocument();
doc.Load(destinationPath);
XmlNodeList elemList = doc.GetElementsByTagName(inputParentNode);
for (int i = 0; i < elemList.Count; i++)
{
XmlNodeList cnList = (elemList[i].ChildNodes);
for (int j = 0; j < cnList.Count; j++)
{
string name = cnList[j].Name;
if (!nodesNames.Contains(name))
nodesNames.Add(name);
}
}
And this is an easy sample of the XML:
<?xml version='1.0' encoding='UTF-8'?>
<parentlist>
<parent>
<firstchild>someValue</firstchild>
<secondchild>someValue</secondchild>
</parent>
<parent>
<firstchild>someValue</firstchild>
<secondchild>someValue</secondchild>
<thirdchild>someValue</thirdchild>
</parent>
</parentlist>
To resume:
in the first case i obtain nodesNames = ["parent", "firstchild", "secondchild", "thirdchild"]
in the second case i obtain nodesNames = ["firstchild", "secondchild", "thirdchild"]
I just want to fix the first to obtain the same result as the second.
Try following :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication2
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
XDocument doc = XDocument.Load(FILENAME);
XElement parentlist= doc.Root;
List<string> descendents = parentlist.Descendants().Where(x => x.HasElements).Select(x => string.Join(",", x.Name.LocalName, string.Join(",", x.Elements().Select(y => y.Name.LocalName)))).ToList();
}
}
}
I get a string which has CDATA and I want to remove that.
Input : "<Text><![CDATA[Hello]]></Text><Text><![CDATA[World]]></Text>"
Output I want : <text>Hello</text>
<text>World</text>
I want to take all data between <text> and </text> and add it to a list.
The code I try is :
private List<XElement> Foo(string input)
{
string pattern = "<text>(.*?)</text>";
input = "<Text><![CDATA[Hello]]></Text><Text><![CDATA[World]]></Text>" //For Testing
var matches = Regex.Matches(input, pattern, RegexOptions.IgnoreCase);
var a = matches.Cast<Match>().Select(m => m.Groups[1].Value.Trim()).ToArray();
List<XElement> li = new List<XElement>();
XElement xText;
for (int i = 0; i < a.Length; i++)
{
xText = new XElement("text");
xText.Add(System.Net.WebUtility.HtmlDecode(a[i]));
li.Add(xText);
}
return li;
}
But, Here I get output as :
<text><![CDATA[Hello]]></text>
<text><![CDATA[World]]></text>
Can anyone please help me up.
It seems to me that you shouldn't be using a regular expression at all. Instead, construct a valid XML document be wrapping it all in a root element, then parse it and extract the elements you want.
You also want to replace all CDATA nodes with their equivalent text nodes. You can do that before or after you extract the elements into a list, but I've chosen to do it before:
using System;
using System.Linq;
using System.Xml.Linq;
class Test
{
static void Main()
{
string input = "<Text><![CDATA[Hello]]></Text><Text><![CDATA[World]]></Text>";
string xml = "<root>" + input + "</root>";
var doc = XDocument.Parse(xml);
var nodes = doc.DescendantNodes().OfType<XCData>().ToList();
foreach (var node in nodes)
{
node.ReplaceWith(new XText(node.Value));
}
var elements = doc.Root.Elements().ToList();
elements.ForEach(Console.WriteLine);
}
}
I would use XDocument instead of Regex:
var value = "<root><Text><![CDATA[Hello]]></Text><Text><![CDATA[World]]></Text></root>";
var doc = XDocument.Parse(value);
Console.WriteLine (doc.Root.Elements().ElementAt(0).Value);
Console.WriteLine (doc.Root.Elements().ElementAt(1).Value);
Ouput:
Hello
World
I have below Xml file Content, I am trying to get values of <text> & <content .. /> tag which are inside the tag <navmap> ... </navmap> only.
I am using XmlDocument() of nameSpace using Windows.Data.Xml.Dom;
I worked with XmlDocument() earlier, But this type of XMl content is quite different, I am not getting idea which property I have to use for Tag value with in the tag.
<docTitle>
<text>XXXXXXX</text>
</docTitle>
<navMap>
<navPoint id="navpoint-1" playOrder="1">
<navLabel>
<text>Title Page</text>
</navLabel>
<content src="000.html" />
</navPoint>
<navPoint id="navpoint-2" playOrder="2">
<navLabel>
<text>Main Text</text>
</navLabel>
<content src="01M.html" />
</navPoint>
</navMap>
I am Working With Windows store apps using c#
I tried like this..
using Windows.Data.Xml.Dom;
---------------
---------------
---------------
StorageFile tocFile = await finalfolder.GetFileAsync(tocFileValue);
string fileContents1 = await FileIO.ReadTextAsync(tocFile);
string encodedContent1 = fileContents1.Replace(" ", " ");
tocDocument.LoadXml(encodedContent1,loadSettings1);
XmlNodeList tocNodeList = tocDocument.GetElementsByTagName("navMap");
foreach (XmlElement Element in tocNodeList)
{
//Element is showing as null..
}
Who are Familiar with XmlDocument() of nameSpace using Windows.Data.Xml.Dom; give me Suggestion.
Thanks
You can Simply do this...
XmlDocument xml = new XmlDocument();
xml.LoadXml(urXml);
XmlNodeList textlist = xml.GetElementsByTagName("text");
XmlNodeList contentList = xml.GetElementsByTagName("content");
for (int i = 0; i < textlist.Count; i++)
{
string s1 = textlist[i].InnerText; //
}
for (int j = 0; j < contentList.Count; j++)
{
string s2 = contentList[j].InnerText;
}
U can get the text through this..string is taken just to show tht u can get the inner text..if you want to store all the values under text tag..use list and Add their innerText
like:-
for (int i = 0; i < textlist.Count; i++)
{
if(i==0)
List<string> str=new list<string>();
str.Add(textlist[i].InnerText);
}
same the case with content tag..
Hope This Helps..:)
With XmlDocument you could do the following...
XmlNodeList xnList = xd.SelectNodes("navMap/navPoint"); //xd being your xmldocument. returns all "navPoint" nodes under navMap and navMap is your root node
foreach (XmlNode node in xnList)
{
string retText = node["navLabel"]["text"].InnerText; // navLabel/text
string retContentAtt = node["content"].Attributes["src"].Value; // navPoint/content src="
}
I think this is what you are looking for. Hope it helps
I'm trying to build up xml document from scratch with use linq-to-xml.
XElement root = new XElement("RootNode");
XDocument doc = new XDocument(
new XDeclaration("1.0", "utf-8", ""), root
);
for (int j = 0; j < 10; j++)
{
XElement element = new XElement("SetGrid");
element.SetElementValue("ID", j);
root.Add(element);
}
var reader = doc.CreateReader();//doc has 10 elements inside root element
string result = reader.ReadInnerXml();//always empty string
How can I get string from XDocument?
Just use string result = doc.ToString() or
var wr = new StringWriter();
doc.Save(wr);
string result = wr.ToString();
One option for empty string as per documentation.
XmlReader return:
All the XML content, including markup, in the current node. If the
current node has no children, an empty string is returned. If the
current node is neither an element nor attribute, an empty string is
returned.
try:
XmlReader reader = doc.CreateReader();
reader.Read();
string result = reader.ReadInnerXml()
var wr = new StringWriter();
doc.Save(wr);
var xmlString = wr.GetStringBuilder().ToString());
There's a full answer is here.
Long story short, you're missing reader.MoveToContent();
i.e. it should be:
var reader = root.CreateReader();
reader.MoveToContent(); // <- the missing line
string result = reader.ReadInnerXml();
This way the result won't be empty and you even don't have to create XDocument
So the full code from the original question + the fix is:
XElement root = new XElement("RootNode");
for (int j = 0; j < 10; j++)
{
XElement element = new XElement("SetGrid");
element.SetElementValue("ID", j);
root.Add(element);
}
var reader = root.CreateReader();// root has 10 elements
reader.MoveToContent(); // <-- missing line
string result = reader.ReadOuterXml(); // now it returns non-empty string
Output:
<RootNode><SetGrid><ID>0</ID></SetGrid><SetGrid><ID>1</ID></SetGrid><SetGrid><ID>2</ID></SetGrid><SetGrid><ID>3</ID></SetGrid><SetGrid><ID>4</ID></SetGrid><SetGrid><ID>5</ID></SetGrid><SetGrid><ID>6</ID></SetGrid><SetGrid><ID>7</ID></SetGrid><SetGrid><ID>8</ID></SetGrid><SetGrid><ID>9</ID></SetGrid></RootNode>
Note: The code is tested in Visual Studio 2013 / .NET Framework 4.5
MDSN Reference: XmlReader.ReadOuterXml
<Sections>
<Classes>
<Class>VI</Class>
<Class>VII</Class>
</Classes>
<Students>
<Student>abc</Student>
<Student>def</Student>
</Students>
</Sections>
I have to loop through Classes to get 'Class' into an array of strings. I have to also loop through 'Students' to get 'Student' put into an array of strings.
XDocument doc.Load("File.xml");
string str1;
foreach(XElement mainLoop in doc.Descendants("Sections"))
{
foreach(XElement classLoop in mainLoop.Descendants("Classes"))
str1 = classLoop.Element("Class").Value +",";
//Also get Student value
}
is not working to get all the classes. Also, I need to rewrite this without using LINQ to XML, i.e using XmlNodeList and XmlNodes.
XmlDocument doc1 = new XmlDocument();
doc1.Load("File.xml");
foreach(XmlNode mainLoop in doc.SelectNodes("Sections")) ??
Not sure how to go about it.
The XPath is straightforward. To get the results into an array you can either use LINQ or a regular loop.
var classNodes = doc.SelectNodes("/Sections/Classes/Class");
// LINQ approach
string[] classes = classNodes.Cast<XmlNode>()
.Select(n => n.InnerText)
.ToArray();
var studentNodes = doc.SelectNodes("/Sections/Students/Student");
// traditional approach
string[] students = new string[studentNodes.Count];
for (int i = 0; i < studentNodes.Count; i++)
{
students[i] = studentNodes[i].InnerText;
}
Not sure about rewriting it for XmlNodes but for your Classes and Students you can simply:
XDocument doc.Load("File.xml");
foreach(XElement c in doc.Descendants("Class"))
{
// do something with c.Value;
}
foreach(XElement s in doc.Descendants("Student"))
{
// do something with s.Value;
}
With LINQ to XML:
XDocument doc = XDocument.Load("file.xml");
var classNodes = doc.Elements("Sections").Elements("Classes").Elements("Class");
StringBuilder result = new StringBuilder();
foreach( var c in classNodes )
result.Append(c.Value).Append(",");
With XPath:
XmlDocument doc = new XmlDocument();
doc.Load("file.xml");
var classNodes = doc.SelectNodes("/Sections/Classes/Class/text()");
StringBuilder result = new StringBuilder();
foreach( XmlNode c in classNodes )
result.Append(c.Value).Append(",");