Extract XML from CData - c#

I have the following problem. I need to write an application in C# that will read a given XML and prepare data for me to load into the database. In XML, the structure of which I have no influence, the main data is placed in CDATA. I have verified that the structure of this data is ordered in the correct XML structure.
I've searched hundreds of posts and can't find any solution from them. Below is the XML file from which I need to extract the data from the CDATA section. Maybe one of you can help me?
<Docs>
<Doc>
<Content>
<![CDATA[
<Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header DocNumber="1" Description="Desc1"></Header>
<Poss>
<Pos Id="1" Name="Pos1"></Pos>
<Pos Id="2" Name="Pos2"></Pos>
</Poss>
</Doc>
]]>
</Content>
</Doc>
<Doc>
<Content>
<![CDATA[
<Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header DocNumber="2" Description="Desc2"></Header>
<Poss>
<Pos Id="3" Name="Pos3"></Pos>
<Pos Id="4" Name="Pos4"></Pos>
</Poss>
</Doc>
]]>
</Content>
</Doc>
</Docs>
For me, the most important are the fields contained in the Content section and I have to load them as data into the database.

To extract the data from the CData part,
Construct the classes.
public class Doc
{
public Header Header { get; set; }
[XmlArrayItem(typeof(Pos), ElementName = "Pos")]
public List<Pos> Poss { get; set; }
}
public class Header
{
[XmlAttribute]
public int DocNumber { get; set; }
[XmlAttribute]
public string Description { get; set; }
}
public class Pos
{
[XmlAttribute]
public int Id { get; set; }
[XmlAttribute]
public string Name { get; set; }
}
Implement the extraction logic.
2.1. Read the XML string as XDocument via XDocument.Parse().
2.2. Select the DescendantNodes for the XPath of "/Docs/Doc/Content".
2.3. Convert the nodes to XCData type.
2.4. With XmlSerializer to deserialize the value of XCData to Doc type.
using System.Linq;
using System.Xml;
using System.Xml.Linq;
using System.Xml.XPath;
using System.Xml.Serialization;
using System.IO;
XmlSerializer xmlSerializer = new XmlSerializer(typeof(Doc));
XDocument xDoc = XDocument.Parse(xml);
var cdataSections = xDoc.XPathSelectElements("/Docs/Doc/Content")
.DescendantNodes()
.OfType<XCData>()
.Select(x => (Doc)xmlSerializer.Deserialize(new StringReader(x.Value)))
.ToList();
Demo # .NET Fiddle

Here is implementation based on a stored procedure with the XML parameter like in my comments.
I had to remove the <Poss> XML element to make CData section a well-formed XML.
SQL
DECLARE #xml XML =
N'<Docs>
<Doc>
<Content><![CDATA[
<Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header DocNumber="1" Description="Desc1"></Header>
<Pos Id="1" Name="Pos1"></Pos>
<Pos Id="2" Name="Pos2"></Pos>
</Doc>
]]>
</Content>
</Doc>
<Doc>
<Content><![CDATA[
<Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header DocNumber="2" Description="Desc2"></Header>
<Pos Id="3" Name="Pos3"></Pos>
<Pos Id="4" Name="Pos4"></Pos>
</Doc>
]]>
</Content>
</Doc>
</Docs>';
--INSERT INTO <targetTable>
SELECT h.value('(Header/#DocNumber)[1]', 'INT') AS DocNumber
, h.value('(Header/#Description)[1]', 'VARCHAR(256)') AS DocDescription
, d.value('#Id', 'INT') AS posID
, d.value('#Name', 'VARCHAR(256)') AS posName
FROM #xml.nodes('/Docs/Doc/Content/text()') AS t(c)
CROSS APPLY (SELECT TRY_CAST(c.query('.').value('.', 'NVARCHAR(MAX)') AS XML)) AS t1(x)
CROSS APPLY x.nodes('/Doc') AS t2(h)
CROSS APPLY h.nodes('Pos') AS t3(d);
Output
DocNumber
DocDescription
posID
posName
2
Desc2
3
Pos3
2
Desc2
4
Pos4
1
Desc1
1
Pos1
1
Desc1
2
Pos2

Related

Iterate through nested XML nodes and save it to a list in asp.net c#

I have a xml file with nested xml nodes. i need to loop through the nodes and save the innertext to list. The xml format is like below:
<xml>
<record>
</record>
<Actions>
<Action Name= "name1">
<screen>
<subAction></SubAction>
</screen>
</Action>
<Action Name = "name2">
<screen Description= "info">
<subAction>
<object>
<name> user </name>
<type> string </type>
<variable> ram </variable>
</object>
<object>
<name> user1 </name>
<type> string1 </type>
<variable> ram1 </variable>
</object>
</subAction>
</screen>
<Screen Description= "info1">
<subAction>
<object>
</object>
</subAction>
</Screen>....goes on
</Action>
</Actions>
</xml>
I need to check if Action == name2, loop through and get all the object types in list. I am not able to get nested nodes.
Below is the code i have tried:
XmlNodeList NodesPro = xmlDoc.SelectNodes("/xml/Actions/Action");
foreach (XmlNode pronode in NodesPro)
{
bool flag = false;
if (pronode.Attributes["Name"].Value == "name2")
{
//Dont know how to proceed. Please help
}
It is better to use LINQ to XML API. It is available in the .Net Framework for more than a decade.
The XML provided is not well-formed. I had to fix it.
It is not clear what is your desired output.
I copied #Sajid's class as a placeholder for data.
c#
void Main()
{
XDocument xsdoc = XDocument.Parse(#"<xml>
<record>
</record>
<Actions>
<Action Name='name1'>
<screen>
<subAction></subAction>
</screen>
</Action>
<Action Name='name2'>
<screen Description='info'>
<subAction>
<object>
<name>user</name>
<type>string</type>
<variable>ram</variable>
</object>
<object>
<name>user1</name>
<type>string1</type>
<variable>ram1</variable>
</object>
</subAction>
</screen>
<Screen Description='info1'>
<subAction>
<object>
</object>
</subAction>
</Screen>....goes on</Action>
</Actions>
</xml>");
List<ActionObject> objects3 = new List<ActionObject>();
foreach (var el in xsdoc.Descendants("Action")
.Where(x => x.Attribute("Name").Value.Equals("name2")))
{
objects3 = el.Descendants("object")
.Select(p => new ActionObject()
{
Name = p.Element("name")?.Value,
Type = p.Element("type")?.Value,
Variable = p.Element("variable")?.Value
}).ToList();
}
}
public class ActionObject
{
public string Name { get; set; }
public string Type { get; set; }
public string Variable { get; set; }
}
I prefer #Yitzhak solution, but if you want to use XmlDocument, you could try the following :
1 - Create class ActionObject:
public class ActionObject
{
public string Name { get; set; }
public string Type { get; set; }
public string Variable { get; set; }
}
2 - Xml
string xml2 = #"
<xml>
<record>
</record>
<Actions>
<Action Name= 'name1'>
<screen></screen>
<subAction></subAction>
</Action>
<Action Name = 'name2'>
<screen Description= 'info'>
<subAction>
<object>
<name> user </name>
<type> string </type>
<variable> ram </variable>
</object>
<object>
<name> user1 </name>
<type> string1 </type>
<variable> ram1 </variable>
</object>
</subAction>
</screen>
<Screen Description= 'info1'>
<subAction>
<object></object>
</subAction>
</Screen>
</Action>
</Actions>
</xml>";
3 - Code that get objects from xml:
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml(xml2);
List<ActionObject> objects = new List<ActionObject>();
XmlNodeList actions = xmlDocument.DocumentElement.SelectNodes("/xml/Actions/Action");
foreach (XmlElement actionNode in actions)
{
if (actionNode.Attributes["Name"].Value != "name2")
continue;
foreach(XmlNode objectNode in actionNode.SelectNodes("./screen/subAction/object"))
{
ActionObject actionObject = new ActionObject
{
Name = objectNode.SelectSingleNode("name").InnerText.Trim(),
Type = objectNode.SelectSingleNode("type").InnerText.Trim(),
Variable = objectNode.SelectSingleNode("variable").InnerText.Trim(),
};
objects.Add(actionObject);
}
}
I hope you find this helpful.

How to get elements without child elements?

Here is my xml:
<Root>
<FirstChild id="1" att="a">
<SecondChild id="11" att="aa">
<ThirdChild>123</ThirdChild>
<ThirdChild>456</ThirdChild>
<ThirdChild>789</ThirdChild>
</SecondChild>
<SecondChild id="12" att="ab">12</SecondChild>
<SecondChild id="13" att="ac">13</SecondChild>
</FirstChild>
<FirstChild id="2" att="b">2</FirstChild>
<FirstChild id="3" att="c">3</FirstChild>
</Root>
This xml doc is very big and may be 1 GB size or more. For better performance in querying, i want to read xml doc step by step. So, in first step i want to read only "First Child"s and their attributes like below:
<FirstChild id="1" att="a"></FirstChild>
<FirstChild id="2" att="b">2</FirstChild>
<FirstChild id="3" att="c">3</FirstChild>
And after that, I maybe want to get "SecondChild"s by id of their parent and so ...
<SecondChild id="11" att="aa"></SecondChild>
<SecondChild id="12" att="ab">12</SecondChild>
<SecondChild id="13" att="ac">13</SecondChild>
How can I do it?
Note: XDoc.Descendants() or XDoc.Elements() load all specific elements with all child elements!
Provided that you have memory available to hold the file, I suggest treating each search step as an item in the outer collection of a PLINQ pipeline.
I would start with an XName collection for the node collections that you want to retrieve. By nesting queries within XElement constructors, you can return new instances of your target nodes, with only name and attribute information.
With a .Where(...) statement or two, you could also filter the attributes being kept, allow for some child nodes to be retained, etc.
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
namespace LinqToXmlExample
{
public class Program
{
public static void Main(string[] args)
{
XElement root = XElement.Load("[your file path here]");
XName[] names = new XName[] { "firstChild", "secondChild", "thirdChild" };
IEnumerable<XElement> elements =
names.AsParallel()
.Select(
name =>
new XElement(
$"result_{name}",
root.Descendants(name)
.AsParallel()
.Select(
x => new XElement(name, x.Attributes()))))
.ToArray();
}
}
}
I suggest creating a new element and copy the attributes.
var sourceElement = ...get "<FirstChild id="1" att="a">...</FirstChild>" through looping, xpath or any method.
var element = new XElement(sourceElement.Name);
foreach( var attribute in sourceElement.Attributes()){
element.Add(new XAttribute(attribute.Name, attribute.Value));
}
In VB this you could do this to get a list of FirstChild
'Dim yourpath As String = "your path here"
Dim xe As XElement
'to load from a file
'xe = XElement.Load(yourpath)
'for testing
xe = <Root>
<FirstChild id="1" att="a">
<SecondChild id="11" att="aa">
<ThirdChild>123</ThirdChild>
<ThirdChild>456</ThirdChild>
<ThirdChild>789</ThirdChild>
</SecondChild>
<SecondChild id="12" att="ab">12</SecondChild>
<SecondChild id="13" att="ac">13</SecondChild>
</FirstChild>
<FirstChild id="2" att="b">2</FirstChild>
<FirstChild id="3" att="c">3</FirstChild>
</Root>
Dim ie As IEnumerable(Of XElement)
ie = xe...<FirstChild>.Select(Function(el)
'create a copy
Dim foo As New XElement(el)
foo.RemoveNodes()
Return foo
End Function)

Reading child tags via readXML

Dataset ds=new DataSet();
ds.ReadXml("path of the xml");
However ds has multiple tables, I would like to fetch out the details tag information of Header and Footer separately:
XElement xElement = XElement.Parse(ds.GetXml().ToString());
var items =xElement
.Descendants("Header");
Above code doesn't yield me the results, it comes back as empty.
How can i get Name & Number tag of Details tag per Header and per Footer? Can i create 2 datasets using ds.ReadXML separtely?
Here is my XML:
<?xml version="1.0" encoding="UTF-8"?>
<Mapping>
<Header>
<Row>
<Details>
<Name>Name</Name>
<DataType>string</DataType>
<Value>Mark</Value>
</Details>
<Details>
<Name>Number</Name>
<DataType>int</DataType>
<Value>1</Value>
</Details>
</Row>
</Header>
<Footer>
<Row>
<Details>
<Name>Name</Name>
<DataType>string</DataType>
<Value>John</Value>
</Details>
<Details>
<Name>Number</Name>
<DataType>int</DataType>
<Value>2</Value>
</Details>
</Row>
</Footer>
</Mapping>
Dataset 1 : Header info - So that i can loop thro' the rows
Dataset 2 : Footer info - So that i can loop thro' the rows
Or is there any other approach to fetch out name, number separately? the objective here is to fetch out the data and build a C# class like
public class Header
{
public Header(){}
public string Name;
public int Number
}
public class Footer
{
public Footer(){}
public string Name;
public int Number
}
XElement mapping = XElement.Parse(ds.GetXml().ToString());
var query = from d in mapping.Descendants("Details")
select new {
Name = (string)d.Element("Name"),
Number = (int)d.Element("Number")
};
If you want to build your class instances:
var headers = from d in mapping.Element("Header")
.Element("Row").Elements("Details")
select new Header {
Name = (string)d.Element("Name"),
Number = (int)d.Element("Number")
};
Getting footers is same, except you should select "Footer" element first.
Also you can use XPath for selecting elements:
var headers = from d in mapping.XPathSelectElements("Header/Row/Details")
select new Header {
Name = (string)d.Element("Name"),
Number = (int)d.Element("Number")
};

Convert XML File to XML Flatten using C# [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I need to convert a XML File to make it readable in Excel. The idea is to Flatten the XML file. In addition, I am using C# as most resources assist me with SQL which i do not need.
Any help would do please.
Source:
<root>
<product>
<screen>
Samsung
</screen>
<screen>
Mecer
</screen>
</product>
<product>
<phone>
Sony
</phone>
<phone>
Nokia
</phone>
</product>
</root>
Expected Result
<dataSet>
<row>
<column>
screen
</column>
<column>
phone
</column>
</row>
<row>
<column>
Samsung
</column>
<column>
Sony
</column>
</row>
<row>
<column>
Mecer
</column>
<column>
Nokia
</column>
</row>
</dataSet>
The better way to you do this is create a new sctructure of class:
[Serializable]
[XmlRoot(ElementName = "dataSet")]
public class RootDataSet
{
[XmlElement(ElementName = "row")]
public List<Rows> Rows { get; set; }
}
[Serializable]
public class Rows
{
[XmlElement(ElementName = "column")]
public List<string> column { get; set; }
}
After you mnake that you can put in a method this code to generate the file.
static void Main(string[] args)
{
using (Stream fileStream = new FileStream(#"C:\Nova pasta\file.xml", FileMode.OpenOrCreate))
{
RootDataSet dataset = new RootDataSet();
dataset.Rows = new List<Rows>();
Rows Rows1 = new Rows();
Rows1.column = new List<string>();
Rows1.column.Add("teste1");
Rows1.column.Add("teste2");
dataset.Rows.Add(Rows1);
//use reflection to get the properties names of the class
//get the values of the class
XmlSerializer xmlSerializer = new XmlSerializer(dataset.GetType());
xmlSerializer.Serialize(fileStream, dataset);
}
}
The returned file will be:
<?xml version="1.0"?>
<dataSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<row>
<column>teste1</column>
<column>teste2</column>
</row>
</dataSet>
As #June Paik mentioned, you could go down the route of XSLT
As this would allow you to configure how you wanted the XML to be transformed without having to recompile the application every time (you can just modify the XSLT and run the application again).
Here is a starting point for the XSLT:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="node( ) | #*">
<xsl:copy><xsl:apply-templates select="#* | node( )"/></xsl:copy>
</xsl:template>
<!-- Name of the element you wish to find here-->
<xsl:template match="Product">
<!-- What you want to change it to here -->
<Row><xsl:apply-templates select="#* | node( )"/></Row>
</xsl:template>
</xsl:stylesheet>
The XSLT Changes elements named "Product" to "Row"
Save this to a file (e.g. productTransform.xslt) on C:\
Then use the XSLT in C# by writing the following 3 lines:
XslCompiledTransform Trans = new XslCompiledTransform();
Trans.Load(#"C:\productTransform.xslt");
Trans.Transform("products.xml", "transformedProducts.xml");

how can I turn a set of csv lines into xml?

Is there a way to take a spreadsheet and turn it into xml file below?
<?xml version="1.0"?>
<ArrayOfBusiness xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Business>
<Name>Company Name 1</Name>
<AddressList>
<Address>
<AddressLine>123 Main St.</AddressLine>
</Address>
</AddressList>
</Business>
<Business>
<Name>Company Name 2</Name>
<AddressList>
<Address>
<AddressLine>1 Elm St.</AddressLine>
</Address>
<Address>
<AddressLine>2 Elm St.</AddressLine>
</Address>
</AddressList>
</Business>
</ArrayOfBusiness>
I put this in LinqPad and it did what you needed. If you've never used LinqPad... well now's a great time to start.
var csvs = new List<string>();
csvs.Add( "Company Name 1,123 Main St.");
csvs.Add("Company Name 2,1 Elm St.,2 Elm St.");
var xml =
(from c in csvs
let split = c.Split(',')
select // split
new XElement("ArrayOfBusiness",
new XElement("Business",
new XElement("Name", split[0] ),
new XElement("AddressList",
new XElement("Address"
,
(from s in split.Skip(1) // skip the first split
select
new XElement("AddressLine", s))
)
)))); // <-- is it LISP?
xml.Dump();
Results:
<ArrayOfBusiness>
<Business>
<Name>Company Name 1</Name>
<AddressList>
<Address>
<AddressLine>123 Main St.</AddressLine>
</Address>
</AddressList>
</Business>
<Business>
<Name>Company Name 2</Name>
<AddressList>
<Address>
<AddressLine>1 Elm St.</AddressLine>
<AddressLine>2 Elm St.</AddressLine>
</Address>
</AddressList>
</Business>
</ArrayOfBusiness>
It isn't exactly what you wanted, but looks functionally equivalent to me. Might need a bit of tweaking in the LINQ.
Write to file with: File.WriteAllText(#"c:\temp\addresses.xml", xml.ToString());
Parsing the .csv file into 'Business' objects should be straightforward.
Its then a simple case of using the XmlSerializer class to generate the xml.
I would say yes, but with out seeing the CSV file it is hard to say.
If your CSV was somthing like this:
Name, Address1, Address2
Company Name 1,123 Main St.,
Company Name 2,1 Elm St.,1 Elm St.
you could easily parse this into a class.
class Business
{
public string Name { get; set; }
public List<Address> AddressList { get; set; }
}
class Address
{
public string AddressLine { get; set; }
}
(untested)

Categories