Remove CDATA from the input - c#

I get a string which has CDATA and I want to remove that.
Input : "<Text><![CDATA[Hello]]></Text><Text><![CDATA[World]]></Text>"
Output I want : <text>Hello</text>
<text>World</text>
I want to take all data between <text> and </text> and add it to a list.
The code I try is :
private List<XElement> Foo(string input)
{
string pattern = "<text>(.*?)</text>";
input = "<Text><![CDATA[Hello]]></Text><Text><![CDATA[World]]></Text>" //For Testing
var matches = Regex.Matches(input, pattern, RegexOptions.IgnoreCase);
var a = matches.Cast<Match>().Select(m => m.Groups[1].Value.Trim()).ToArray();
List<XElement> li = new List<XElement>();
XElement xText;
for (int i = 0; i < a.Length; i++)
{
xText = new XElement("text");
xText.Add(System.Net.WebUtility.HtmlDecode(a[i]));
li.Add(xText);
}
return li;
}
But, Here I get output as :
<text><![CDATA[Hello]]></text>
<text><![CDATA[World]]></text>
Can anyone please help me up.

It seems to me that you shouldn't be using a regular expression at all. Instead, construct a valid XML document be wrapping it all in a root element, then parse it and extract the elements you want.
You also want to replace all CDATA nodes with their equivalent text nodes. You can do that before or after you extract the elements into a list, but I've chosen to do it before:
using System;
using System.Linq;
using System.Xml.Linq;
class Test
{
static void Main()
{
string input = "<Text><![CDATA[Hello]]></Text><Text><![CDATA[World]]></Text>";
string xml = "<root>" + input + "</root>";
var doc = XDocument.Parse(xml);
var nodes = doc.DescendantNodes().OfType<XCData>().ToList();
foreach (var node in nodes)
{
node.ReplaceWith(new XText(node.Value));
}
var elements = doc.Root.Elements().ToList();
elements.ForEach(Console.WriteLine);
}
}

I would use XDocument instead of Regex:
var value = "<root><Text><![CDATA[Hello]]></Text><Text><![CDATA[World]]></Text></root>";
var doc = XDocument.Parse(value);
Console.WriteLine (doc.Root.Elements().ElementAt(0).Value);
Console.WriteLine (doc.Root.Elements().ElementAt(1).Value);
Ouput:
Hello
World

Related

Retrieve list of xml tag of child nodes in linQ to xml

I'm trying to get all the list of the different child nodes (not starting from root) of a loaded XML into a list of strings, I had done using System.Xml library but I want to write the same code with LINQ to XML too.
I had found a code that helped me a lot but it starts from Root, here is the code:
List<string> nodesNames = new List<string>();
XDocument xdoc1 = XDocument.Load(destinationPath);
XElement root = xdoc1.Document.Root;
foreach (var name in root.DescendantNodes().OfType<XElement>()
.Select(x => x.Name).Distinct())
{
if (!nodesNames.Contains(name.ToString()))
nodesNames.Add(name.ToString());
}
With this, I get the list of all child nodes + the parent node too, which I don't want to use.FirstChild or to delete manually from the list because I want a TOTALLY dynamic code and I have in input the parent node passed by the user.
For a better comprehension, this is the code that working for me but with System.Xml:
List<string> nodesNames = new List<string>();
XmlDocument doc = new XmlDocument();
doc.Load(destinationPath);
XmlNodeList elemList = doc.GetElementsByTagName(inputParentNode);
for (int i = 0; i < elemList.Count; i++)
{
XmlNodeList cnList = (elemList[i].ChildNodes);
for (int j = 0; j < cnList.Count; j++)
{
string name = cnList[j].Name;
if (!nodesNames.Contains(name))
nodesNames.Add(name);
}
}
And this is an easy sample of the XML:
<?xml version='1.0' encoding='UTF-8'?>
<parentlist>
<parent>
<firstchild>someValue</firstchild>
<secondchild>someValue</secondchild>
</parent>
<parent>
<firstchild>someValue</firstchild>
<secondchild>someValue</secondchild>
<thirdchild>someValue</thirdchild>
</parent>
</parentlist>
To resume:
in the first case i obtain nodesNames = ["parent", "firstchild", "secondchild", "thirdchild"]
in the second case i obtain nodesNames = ["firstchild", "secondchild", "thirdchild"]
I just want to fix the first to obtain the same result as the second.
Try following :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication2
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
XDocument doc = XDocument.Load(FILENAME);
XElement parentlist= doc.Root;
List<string> descendents = parentlist.Descendants().Where(x => x.HasElements).Select(x => string.Join(",", x.Name.LocalName, string.Join(",", x.Elements().Select(y => y.Name.LocalName)))).ToList();
}
}
}

Extracting XML tags values

I have a list of XML files that I need to extract 3 values from each file.
The XML looks somewhat like :
<ClinicalDocument xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" moodCode="EVN" xmlns="urn:hl7-org:v3">
<title>Summary</title>
<recordTarget>
<patientRole>
<patient>
<name>
<given>John</given>
<given>S</given>
<family>Doe</family>
</name>
<birthTime value="19480503" />
I'm trying to extract given name, family name and birth time.
Initially I'm trying to print out the values using:
XmlDocument doc2 = new XmlDocument();
doc2.Load(#"Z:\\DATA\\file.XML");
XmlElement root = doc2.DocumentElement;
XmlNodeList list = root.GetElementsByTagName("name");
for (int i = 0; i < list.Count; i++)
{
Console.WriteLine(list.Item(i).Value);
}
I'm not getting any value printed, but when I debug and check the inner values of "list" I can see what I need from that tag.
How can I extract the needed information?
Your code and all other answers ignore the default namespace xmlns="urn:hl7-org:v3"
I find Linq2Xml easier to use, so I'll post an answer using it..
var xDoc = XDocument.Load(filename);
var #namespace = "urn:hl7-org:v3";
XmlNamespaceManager namespaceManager = new XmlNamespaceManager(xDoc.CreateNavigator().NameTable);
namespaceManager.AddNamespace("ns", #namespace);
XNamespace ns = #namespace;
var names = xDoc.XPathSelectElements("//ns:patient/ns:name", namespaceManager).ToList();
var list = names.Select(p => new
{
Given = string.Join(", ", p.Elements(ns + "given").Select(x => (string)x)),
Family = (string)p.Element(ns + "family"),
BirthTime = new DateTime(1970,1,1).AddSeconds( (int)p.Parent.Element(ns + "birthTime").Attribute("value"))
})
.ToList();
Try this instead:
XmlDocument doc2 = new XmlDocument();
doc2.Load(#"Path\To\XmlFile.xml");
XmlElement root = doc2.DocumentElement;
XmlNodeList list = root.GetElementsByTagName("name");
var names = list[0].ChildNodes;
for (int i = 0; i < names.Count; i++)
{
Console.WriteLine(names[i].InnerText);
}
Output:
John
S
Doe
There are 2 issues with your code:
The first being that you were iterating around the name element, which only has a Count of 1 (as there is only one of these). That's why I included list[0],ChildNodes, to get all the children of the name element (given, given and family).
To retrieve the text inside each element, ("John", "S", "Doe"), you should use InnerText instead of Value
It's not clear from your example XML if there is only ever one <name> element or if there could be multiple. The following assumes there might be multiple. It also grabs the birthdate.
for (int i = 0; i < list.Count; i++)
{
var xmlNode = list.Item(i).FirstChild;
while (xmlNode != null)
{
Console.WriteLine(xmlNode.InnerText);
xmlNode = xmlNode.NextSibling;
}
}
XmlNodeList birthDates = root.GetElementsByTagName("birthTime");
for (int i = 0; i < list.Count; i++)
{
Console.WriteLine(birthDates[i].Attributes["value"].Value);
}
If there are multiple <patient> elements in your xml you could do:
using System;
using System.Xml;
using System.Xml.Linq;
using System.Xml.XPath;
class Program
{
static void Main()
{
var doc = XDocument.Load("a.xml");
var nsm = new XmlNamespaceManager(new NameTable());
nsm.AddNamespace("x", "urn:hl7-org:v3");
var patients = doc.XPathSelectElements("//x:patient", nsm);
foreach (var patient in patients)
{
Console.WriteLine(patient.XPathSelectElement("./x:name/x:given[1]", nsm).Value);
Console.WriteLine(patient.XPathSelectElement("./x:name/x:given[2]", nsm).Value);
Console.WriteLine(patient.XPathSelectElement("./x:name/x:family", nsm).Value);
Console.WriteLine(patient.XPathSelectElement("./x:birthTime", nsm).Attribute("value").Value);
}
}
}
Why do you need to add the name space explicitly even if it's a default name space in the xml? see: this answer

XDocument XPath + Regex element locator

Is there an equivalent XmlDocument node locator for Elements in XDocument?
private const string InvalidDateTest =
"[text() = \"0000-00-00\" or text() = \" - - \" or text() = \"- - \"]";
xmlDocument.SelectNodes("//DeterminedDate/Value" + InvalidDateTest);
sauced it out myself:
using System.Xml.XPath;
xDocument.XPathSelectElements("//DeterminedDate/Value" + InvalidDateTest);
XDocument + Xpath sample :
using(var stream = new StringReader(xml))
{
XDocument xmlFile = XDocument.Load(stream);
var query = (IEnumerable) xmlFile.XPathEvaluate("/REETA/AFFIDAVIT/COUNTY_NAME");
foreach(var band in query.Cast < XElement > ())
{
Console.WriteLine(band.Value);
}
xmlFile.Save("books.xml");
}
To discriminate the nodes you need to simply chain extensions to the descendant call which will filter out what is needed:
string Data = #"<?xml version=""1.0""?>
<Notifications>
<Alerts>
<Alert>1</Alert>
<Alert>2</Alert>
<Alert>3</Alert>
</Alerts>
</Notifications>";
XDocument.Parse(Data)
.Descendants("Alert")
.Where (node => int.Parse(node.Value) > 1 && int.Parse(node.Value) < 3)
.ToList()
.ForEach(al => Console.WriteLine ( al.Value ) ); // 2 is the result

Loop through multiple subnodes in XML

<Sections>
<Classes>
<Class>VI</Class>
<Class>VII</Class>
</Classes>
<Students>
<Student>abc</Student>
<Student>def</Student>
</Students>
</Sections>
I have to loop through Classes to get 'Class' into an array of strings. I have to also loop through 'Students' to get 'Student' put into an array of strings.
XDocument doc.Load("File.xml");
string str1;
foreach(XElement mainLoop in doc.Descendants("Sections"))
{
foreach(XElement classLoop in mainLoop.Descendants("Classes"))
str1 = classLoop.Element("Class").Value +",";
//Also get Student value
}
is not working to get all the classes. Also, I need to rewrite this without using LINQ to XML, i.e using XmlNodeList and XmlNodes.
XmlDocument doc1 = new XmlDocument();
doc1.Load("File.xml");
foreach(XmlNode mainLoop in doc.SelectNodes("Sections")) ??
Not sure how to go about it.
The XPath is straightforward. To get the results into an array you can either use LINQ or a regular loop.
var classNodes = doc.SelectNodes("/Sections/Classes/Class");
// LINQ approach
string[] classes = classNodes.Cast<XmlNode>()
.Select(n => n.InnerText)
.ToArray();
var studentNodes = doc.SelectNodes("/Sections/Students/Student");
// traditional approach
string[] students = new string[studentNodes.Count];
for (int i = 0; i < studentNodes.Count; i++)
{
students[i] = studentNodes[i].InnerText;
}
Not sure about rewriting it for XmlNodes but for your Classes and Students you can simply:
XDocument doc.Load("File.xml");
foreach(XElement c in doc.Descendants("Class"))
{
// do something with c.Value;
}
foreach(XElement s in doc.Descendants("Student"))
{
// do something with s.Value;
}
With LINQ to XML:
XDocument doc = XDocument.Load("file.xml");
var classNodes = doc.Elements("Sections").Elements("Classes").Elements("Class");
StringBuilder result = new StringBuilder();
foreach( var c in classNodes )
result.Append(c.Value).Append(",");
With XPath:
XmlDocument doc = new XmlDocument();
doc.Load("file.xml");
var classNodes = doc.SelectNodes("/Sections/Classes/Class/text()");
StringBuilder result = new StringBuilder();
foreach( XmlNode c in classNodes )
result.Append(c.Value).Append(",");

LINQ to read XML

I am using this XML file:
<root>
<level1 name="A">
<level2 name="A1" />
<level2 name="A2" />
</level1>
<level1 name="B">
<level2 name="B1" />
<level2 name="B2" />
</level1>
<level1 name="C" />
</root>
Could someone give me a C# code using LINQ, the simplest way to print this result:
(Note the extra space if it is a level2 node)
A
A1
A2
B
B1
B2
C
Currently I have written this code:
XDocument xdoc = XDocument.Load("data.xml"));
var lv1s = from lv1 in xdoc.Descendants("level1")
select lv1.Attribute("name").Value;
foreach (var lv1 in lv1s)
{
result.AppendLine(lv1);
var lv2s = from lv2 in xdoc...???
}
Try this.
using System.Xml.Linq;
void Main()
{
StringBuilder result = new StringBuilder();
//Load xml
XDocument xdoc = XDocument.Load("data.xml");
//Run query
var lv1s = from lv1 in xdoc.Descendants("level1")
select new {
Header = lv1.Attribute("name").Value,
Children = lv1.Descendants("level2")
};
//Loop through results
foreach (var lv1 in lv1s){
result.AppendLine(lv1.Header);
foreach(var lv2 in lv1.Children)
result.AppendLine(" " + lv2.Attribute("name").Value);
}
Console.WriteLine(result);
}
Or, if you want a more general approach - i.e. for nesting up to "levelN":
void Main()
{
XElement rootElement = XElement.Load(#"c:\events\test.xml");
Console.WriteLine(GetOutline(0, rootElement));
}
private string GetOutline(int indentLevel, XElement element)
{
StringBuilder result = new StringBuilder();
if (element.Attribute("name") != null)
{
result = result.AppendLine(new string(' ', indentLevel * 2) + element.Attribute("name").Value);
}
foreach (XElement childElement in element.Elements())
{
result.Append(GetOutline(indentLevel + 1, childElement));
}
return result.ToString();
}
A couple of plain old foreach loops provides a clean solution:
foreach (XElement level1Element in XElement.Load("data.xml").Elements("level1"))
{
result.AppendLine(level1Element.Attribute("name").Value);
foreach (XElement level2Element in level1Element.Elements("level2"))
{
result.AppendLine(" " + level2Element.Attribute("name").Value);
}
}
Here are a couple of complete working examples that build on the #bendewey & #dommer examples. I needed to tweak each one a bit to get it to work, but in case another LINQ noob is looking for working examples, here you go:
//bendewey's example using data.xml from OP
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Linq;
class loadXMLToLINQ1
{
static void Main( )
{
//Load xml
XDocument xdoc = XDocument.Load(#"c:\\data.xml"); //you'll have to edit your path
//Run query
var lv1s = from lv1 in xdoc.Descendants("level1")
select new
{
Header = lv1.Attribute("name").Value,
Children = lv1.Descendants("level2")
};
StringBuilder result = new StringBuilder(); //had to add this to make the result work
//Loop through results
foreach (var lv1 in lv1s)
{
result.AppendLine(" " + lv1.Header);
foreach(var lv2 in lv1.Children)
result.AppendLine(" " + lv2.Attribute("name").Value);
}
Console.WriteLine(result.ToString()); //added this so you could see the output on the console
}
}
And next:
//Dommer's example, using data.xml from OP
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Linq;
class loadXMLToLINQ
{
static void Main( )
{
XElement rootElement = XElement.Load(#"c:\\data.xml"); //you'll have to edit your path
Console.WriteLine(GetOutline(0, rootElement));
}
static private string GetOutline(int indentLevel, XElement element)
{
StringBuilder result = new StringBuilder();
if (element.Attribute("name") != null)
{
result = result.AppendLine(new string(' ', indentLevel * 2) + element.Attribute("name").Value);
}
foreach (XElement childElement in element.Elements())
{
result.Append(GetOutline(indentLevel + 1, childElement));
}
return result.ToString();
}
}
These both compile & work in VS2010 using csc.exe version 4.0.30319.1 and give the exact same output. Hopefully these help someone else who's looking for working examples of code.
EDIT: added #eglasius' example as well since it became useful to me:
//#eglasius example, still using data.xml from OP
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Linq;
class loadXMLToLINQ2
{
static void Main( )
{
StringBuilder result = new StringBuilder(); //needed for result below
XDocument xdoc = XDocument.Load(#"c:\\deg\\data.xml"); //you'll have to edit your path
var lv1s = xdoc.Root.Descendants("level1");
var lvs = lv1s.SelectMany(l=>
new string[]{ l.Attribute("name").Value }
.Union(
l.Descendants("level2")
.Select(l2=>" " + l2.Attribute("name").Value)
)
);
foreach (var lv in lvs)
{
result.AppendLine(lv);
}
Console.WriteLine(result);//added this so you could see the result
}
}
XDocument xdoc = XDocument.Load("data.xml");
var lv1s = xdoc.Root.Descendants("level1");
var lvs = lv1s.SelectMany(l=>
new string[]{ l.Attribute("name").Value }
.Union(
l.Descendants("level2")
.Select(l2=>" " + l2.Attribute("name").Value)
)
);
foreach (var lv in lvs)
{
result.AppendLine(lv);
}
Ps. You have to use .Root on any of these versions.
Asynchronous loading of the XML file can improve performance, especially if the file is large or if it takes a long time to load. In this example, we use the XDocument.LoadAsync method to load and parse the XML file asynchronously, which can help to prevent the application from becoming unresponsive while the file is being loaded.
Demo: https://dotnetfiddle.net/PGFE7c (using XML string parsing)
Implementation:
XDocument doc;
// Open the XML file using File.OpenRead and pass the stream to
// XDocument.LoadAsync to load and parse the XML asynchronously
using (var stream = File.OpenRead("data.xml"))
{
doc = await XDocument.LoadAsync(stream, LoadOptions.None, CancellationToken.None);
}
// Select the level1 elements from the document and create an anonymous object for each element
// with a Name property containing the value of the "name" attribute and a Children property
// containing a collection of the names of the level2 elements
var results = doc.Descendants("level1")
.Select(level1 => new
{
Name = level1.Attribute("name").Value,
Children = level1.Descendants("level2")
.Select(level2 => level2.Attribute("name").Value)
});
foreach (var result in results)
{
Console.WriteLine(result.Name);
foreach (var child in result.Children)
Console.WriteLine(" " + child);
}
Result:
A
  A1
  A2
B
  B1
  B2
C

Categories