How to load and merge a dataset of XML docs - c#

I would like to consume a dataset of XML documents, and merge them into a single document containing only distinct elements.
To illustrate, I have a dataset as:
r, x
-- -------------------------------
1, <root><a>111</a></root>
2, <root><a>222</a><b>222</b></root>
3, <root><c>333</c></root>
would result in:
<a>111</a><b>222</b><c>333</c>
The <a> element from r=2 is not merged since we already have an element = <a> from r=1. I need only merge new elements, starting with r=1 going forward.
I am able to iterate over the list, but having difficulty comparing and merging. The code below fails to identify <a>222</a> as a duplicate. Is it possibly comparing the element values as well?
using (SqlDataReader dsReader = cmd.ExecuteReader())
{
XDocument baseDoc = new XDocument();
XDocument childDoc = new XDocument();
while (dsReader.Read())
{
// this is the base doc, merge forward from here
if (dsReader["r"].ToString() == "1")
{
baseDoc = XDocument.Parse(dsReader["x"].ToString());
SqlContext.Pipe.Send("start:" + baseDoc.ToString());
}
// this is a child doc, do merge operation
else
{
childDoc = XDocument.Parse(dsReader["x"].ToString());
// find elements only present in child
var childOnly = (childDoc.Descendants("root").Elements()).Except(baseDoc.Descendants("root").Elements());
foreach (var e in childOnly)
{
baseDoc.Root.Add(e);
}
}
}
}

I am bit confused about baseDoc and childDoc usage in your code. I hope I correctly understood your question. Here is my proposal:
using (SqlDataReader dsReader = cmd.ExecuteReader())
{
XElement result = new XElement("root");
while (dsReader.Read())
{
// Read source
XDocument srcDoc = XDocument.Parse(dsReader["x"].ToString());
// Construct result element
foreach (XElement baseElement in srcDoc.Descendants("root").Elements())
if (result.Element(baseElement.Name) == null) // skip already added nodes
result.Add(new XElement(baseElement.Name, baseElement.Value));
}
// Construct result string from sub-elements (to avoid "<root>..</root>" in output)
string str = "";
foreach (XElement element in result.Elements())
str += element.ToString();
// send the result
SqlContext.Pipe.Send("start:" + str);
}
Note that my code ignores r-numbering. I use order as it comes from sql data reader. If rows are not sorted by "r", then additional sort is required before my code.

Related

Read xml file containing multiple tags with same attribute name and replace the value based on user's input

Below is my sample XML file stored in the server ;
<exam>
<name>Maths</name>
<percentage>100</percentage>
</exam>
<exam>
<name>Physics</name>
<percentage>70</percentage>
</exam>
<exam>
<name>Chemistry</name>
<percentage>70</percentage>
</exam>
I have another table as mentioned below
Name of Exam Percentage
Maths 50
Physics 60
Chemistry 70
What I need here is that I need to read this XML File and replace the percentage value in the XML file based on the table that I have. I have more than 75 tags for exam
I have used the below logic of hardcoding everything but I am not sure if my logic would be good
public static void Changepercentage()
{
try{
string xmlpercentage= Loaddefault.xmlpercentage;
string f = xml
List<string> lines = new List<string>();
// 2
// Use using StreamReader for disposing.
using (StreamReader r = new StreamReader(f, System.Text.Encoding.Default))
{
// 3
// Use while != null pattern for loop
string line;
while ((line = r.ReadLine()) != null)
{
if (System.Text.RegularExpressions.Regex.IsMatch(line, "<exam>Maths</exam>"))
{
lines.Add(#"" + line + "");
line = "<percentage>50</percentage>";
}
}
}
System.IO.File.WriteAllLines(xmlpercentage, lines.ToArray());
Logger.Instance.InfoLog("Successfully updated the percentage.xml file");
}
catch (Exception ex)
{
Logger.Instance.ErrorLog("Problem in updating the percentage.xml file :"+ ex.Message);
throw new Exception("Problem in updating the percentage.xml file");
}
}
You can use this documentation
//Make sure that the project references the System.Xml namespace.
//Use the Imports statement on the Xml namespace
using System.Xml;
//Create a new XmlDocument class, and use the Load method to load it.
XmlDocument myXmlDocument = new XmlDocument();
myXmlDocument.Load ("test.xml");
//Iterate through the children of the document element, and find all the "percentage" nodes. and update it.
foreach(XmlNode node1 in node.ChildNodes)
foreach (XmlNode node2 in node1.ChildNodes)
if (node2.Name == "percentage")
{
Decimal percentage = 60;//your new percentage
node2.InnerText = percentage;
}
//Use the Save method of the XmlDocument class to save the altered XML to a new file that is named test1.xml.
myXmlDocument.Save("test1.xml");
Iterate over all <exam> nodes of your XML, read the child node <name>. With the InnerText of name query the data base and put the data base result into the <percentage> node.
Something like this should do:
var doc = XDocument.Parse(yourXml);
foreach(XElement exam in doc.Descendants("exam"))
{
var examName = exam.Descendants("name").First().Value;
var newPercentage = GetPercentage(examName);
exam.Descendants("percentage").First().Value = newPercentage;
}

How can I remove duplicate, invalid, child nodes from an XML document using Linq to XML?

I'm creating XML from JSON retrieved from an HttpWebRequest call, using JsonConvert. The JSON I'm getting back sometimes has duplicate nodes, creating duplicate nodes in the XML after conversion, which I then have to remove.
The processing of the JSON to XML conversion is being done in a generic service call wrapper that has no knowledge of the underlying data structure and so can't do any XPath queries based on a named node. The duplicates could be at any level within the XML.
I've got to the stage where I have a list of the names of duplicate nodes at each level but am not sure of the Linq query to use this to remove all but the first node with that name.
My code:
protected virtual void RemoveDuplicateChildren(XmlNode node)
{
if (node.NodeType != XmlNodeType.Element || !node.HasChildNodes)
{
return;
}
var xNode = XElement.Load(node.CreateNavigator().ReadSubtree());
var duplicateNames = new List<string>();
foreach (XmlNode child in node.ChildNodes)
{
var isBottom = this.IsBottomElement(child); // Has no XmlNodeType.Element type children
if (!isBottom)
{
this.RemoveDuplicateChildren(child);
}
else
{
var count = xNode.Elements(child.Name).Count();
if (count > 1 && !duplicateNames.Contains(child.Name))
{
duplicateNames.Add(child.Name);
}
}
}
if (duplicateNames.Count > 0)
{
foreach (var duplicate in duplicateNames)
{
xNode.Elements(duplicate).SelectMany(d => d.Skip(1)).Remove();
}
}
}
The final line of code obviously isn't correct but I can't find an example of how to rework it to retrieve and then remove all but the first matching element.
UPDATE:
I have found two ways of doing this now, one using the XElement and one the XmlNode, but neither actually removes the nodes.
Method 1:-
foreach (var duplicate in duplicateNames)
{
xNode.Elements(duplicate).Skip(1).Remove();
}
Method 2:-
foreach (var duplicate in duplicateNames)
{
var nodeList = node.SelectNodes(duplicate);
if (nodeList.Count > 1)
{
for (int i=1; i<nodeList.Count; i++)
{
node.RemoveChild(nodeList[i]);
}
}
}
What am I missing?
If you don't want any duplicate names: (assuming no namespaces)
XElement root = XElement.Load(file); // .Parse(string)
List<string> names = root.Descendants().Distinct(x => x.Name.LocalName).ToList();
names.ForEach(name => root.Descendants(name).Skip(1).Remove());
root.Save(file); // or root.ToString()
You might try to solve the problem at the wrong level. In XML is perfectly valid to have multiple nodes with the same name. JSON structures with duplicate property names should be invalid. You should try to do this sanitation at the JSON level and not after it was already transformed to XML.
For the xml cleanup this might be a starting point:
foreach (XmlNode child
in node.ChildNodes.Distinct(custom comparer that looks on node names))
{
.....
}

C# reading XML , how to get the value

I am new to XML. Need some help.
I can get pro NAME fine but
How do I get the value of scode? JDK...blah
<pro NAME="JK1233k">
<scode ID="A">JDK-ORPLL-PDILL</scode>
</pro>
XmlReader reader = XmlReader.Create("file.xml");
while (reader.Read())
{
if ((reader.NodeType == XmlNodeType.Element) && (reader.Name == "pro"))
{
Console.WriteLine(reader["NAME"]);
}
else if((reader.NodeType == XmlNodeType.Element) && (reader.Name == "scode"))
{
Console.WriteLine(reader["ID"]);
//what do I put here to get the value????
}
}
reader.Close();
What you're looking for is:
Console.WriteLine(reader.ReadInnerXml());
I personally prefer LINQ to XML. If you haven't looked into it, you should. You can achieve the same thing in a cleaner manner.
at the start of your c# file put the following:
using System.Linq;
using System.Xml.Linq; // loads the linq to xml part
Most XML files are much bigger than just one element. So for that, your code would be something like this:
// Load XML file as an IEnumerable. This allows you to query it.
var xmlDoc = XDocument.Load(file)
.Elements("pro")
.Select(pro => new
{
Name = pro.Attribute("NAME").Value,
Scode = pro.Elements("scode").Select(scode => new
{
ID = scode.Attribute("ID").Value,
Val = scode.Value
})
});
// loop through each <pro> element
foreach (var pro in xmlDoc)
{
// Get Pro Name
Console.WriteLine(pro.Name);
// loop through each <scode> element inside <pro>
foreach(var scode in pro.Scode)
{
// Get Scode ID:
Console.WriteLine(scode.ID);
// Get Scode Value:
Console.WriteLine(scode.Val);
}
}
If your XML is only a SINGLE element, you can do this:
// Load XML file:
var pro = XElement.Load("file.xml");
// Get Pro Name
pro.Attribute("NAME").Value;
// Get Scode ID:
pro.Element("scode").Attribute("ID").Value;
// Get Scode Value:
pro.Element("scode").Value;
Consider the following code snippet...
XDocument doc = XDocument.Load("file.xml");
foreach (XElement element in doc.Descendants("pro"))
{
Console.WriteLine(element.Attribute("NAME").Value);
}
foreach (XElement element in doc.Descendants("scode"))
{
Console.WriteLine(element.Value);
}
Good Luck!

Get XmlNodeList if a particular element value or its attribute value is present in a given list of strings

I would like to get XmlNodeList from a huge XML file.
Conditions:
I have a List of unique ID values, say IDList
Case I: Collect all the nodes where element called ID has value from IDList.
Case II: Collect all nodes where one of the attribute called idName of element ID has value from IDList.
In short, extract only the nodes which match with the values given in the IDList.
I did this using some loops like load this XML to XmlDocument to iterate over all nodes and ID value but what I am looking for is some sophisticated method to do it faster and in quick way.
Because looping isn't a solution for a large XML file.
My try:
try
{
using (XmlReader reader = XmlReader.Create(URL))
{
XmlDocument doc = new XmlDocument();
doc.Load(reader);
XmlNodeList nodeList = doc.GetElementsByTagName("idgroup");
foreach (XmlNode xn in nodeList)
{
string id = xn.Attributes["id"].Value;
string value = string.Empty;
if (IDList.Contains(id))
{
value = xn.ChildNodes[1].ChildNodes[1].InnerText; // <value>
if (!string.IsNullOrEmpty(value))
{
listValueCollection.Add(value);
}
}
}
}
}
catch
{}
XML (XLIFF) structure:
<XLIFF>
<xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.2">
<file date="2013-07-17">
<body>
<id idName="test_001" >
<desc-group name="test_001">
<desc type="text"/>
</desc-group>
<result-unit idName="test_001_text">
<source>abcd</source>
<result>xyz</result>
</result-unit>
</id>
</body>
</file>
</xliff>
Collect all the nodes like above where idName matches.
EDIT
This is a test that can parse the example you are giving. It attempts to reach the result node directly, so that it stays as efficient as possible.
[Test]
public void TestXPathExpression()
{
var idList = new List<string> { "test_001" };
var resultsList = new List<string>();
// Replace with appropriate method to open your URL.
using (var reader = new XmlTextReader(File.OpenRead("fixtures\\XLIFF_sample_01.xlf")))
{
var doc = new XmlDocument();
doc.Load(reader);
var root = doc.DocumentElement;
// This is necessary, since your example is namespaced.
var nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("x", "urn:oasis:names:tc:xliff:document:1.2");
// Go directly to the node from which you want the result to come from.
foreach (var nodes in idList
.Select(id => root.SelectNodes("//x:file/x:body/x:id[#idName='" + id + "']/x:result-unit/x:result", nsmgr))
.Where(nodes => nodes != null && nodes.Count > 0))
resultsList.AddRange(nodes.Cast<XmlNode>().Select(node => node.InnerText));
}
// Print the resulting list.
resultsList.ForEach(Console.WriteLine);
}
You can extract only those nodes you need by using an XPath query. A brief example on how you 'd go about it:
using (XmlReader reader = XmlReader.Create(URL))
{
XmlDocument doc = new XmlDocument();
doc.Load(reader);
foreach(var id in IDList) {
var nodes = doc.SelectNodes("//xliff/file/body/id[#idName='" + id + "']");
foreach(var node in nodes.Where(x => !string.IsNullOrEmpty(x.ChildNodes[1].ChildNodes[1].InnerText)))
listValueCollection.Add(node.ChildNodes[1].ChildNodes[1].InnerText);
}
}
The xpath expression is of course an example. If you want, you can post an example of your XML so I can give you something more accurate.

How to parse XML in a string in .NET?

Hi Fellow StackOverflowers,
I am receiving a string in one of my .NET function. The string when viewed from the XML Visualizer looks like this:
- <root>
- <Table>
<ID>ABC-123</ID>
<CAT>Housekeeping</CAT>
<DATE>21-JUN-2009</DATE>
<REP_BY>John</REP_BY>
<LOCATION>Head Office</LOCATION>
</Table>
- <Table>
<ID>ABC-124</ID>
<CAT>Environment</CAT>
<DATE>23-JUN-2009</DATE>
<REP_BY>Michelle</REP_BY>
<LOCATION>Block C</LOCATION>
</Table>
- <Table>
<ID>ABC-125</ID>
<CAT>Staging</CAT>
<DATE>21-JUN-2009</DATE>
<REP_BY>George</REP_BY>
<LOCATION>Head Office</LOCATION>
</Table>
- <Table>
<ID>ABC-123</ID>
<CAT>Housekeeping</CAT>
<DATE>21-JUN-2009</DATE>
<REP_BY>John</REP_BY>
<LOCATION space="preserve" xmlns="http://www.w3.org/XML/1998/namespace" />
</Table>
</root>
I need to parse this string so that I could write the data into a datatable whose columns are the xml tags for each data.
In the above text, I would then have a datatable that wil have 5 columns, named ID, CAT, DATE, REP_BY and LOCATION which will then contain 4 rows of data.
In the fourth tag, notice that the does not have any data, but rather it is marked space="preserve". This would mean that the data I am placing in my datatable would be blank for the LOCATION column of the fourth row.
How can I achieve this? Sample codes would be highly appreciated. Thanks.
Using the XmlReader class. This class is fast and does not use a lot of memory but reading the xml can be difficult.
using (StringReader strReader = new StringReader(yourXMLString))
{
using (XmlReader reader = XmlReader.Create(strReader))
{
while (reader.Read())
{
if(reader.Name == "Table" && reader.NodeType == reader.NodeType == XmlNodeType.Element)
{
using(XmlReader tableReader = reader.ReadSubtree())
{
ReadTableNode(tableReader);
}
}
}
}
}
private void ReadTableNode(XmlReader reader)
{
while (reader.Read())
{
if(reader.Name == "ID" && reader.NodeType == reader.NodeType == XmlNodeType.Element)
//do something
else if(reader.Name == "CAT" && reader.NodeType == reader.NodeType == XmlNodeType.Element)
//do something
//and continue....
}
}
To get an attribute of the current node you use:
string value = reader.GetAttribute(name_of_attribute);
To get the inner text of an element:
string innerText = reader.ReadString();
Using the XmlDocument class. This class is slow but manipulating and reading the xml is very easy because the entire xml is loaded.
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(yourXMLString);
//do something
Using the XDocument class. The advantage of using XDocument is elements can be accessed directly and simultaneously. This class also use the power of LINQ to query the xml document.
using(StringReader tr = new StringReader(yourXMLString))
{
XDocument doc = XDocument.Load(tr);
//do something
}
This is probably the simplest solution to get the XML into table form. Throwing the attributes out using regular expressions is not that smart (and safe), but I don't like the System.Xml API and LINQ to XML is no option in .NET 2.0.
using System;
using System.Data;
using System.IO;
using System.Text.RegularExpressions;
namespace GeneralTestApplication
{
class Program
{
private static void Main()
{
String input = #"<root><Table> [...] </root>";
input = Regex.Replace(input, #" [a-zA-Z]+=""[^""]*""", String.Empty);
DataSet dataSet = new DataSet();
dataSet.ReadXml(new StringReader(input));
foreach (DataRow row in dataSet.Tables[0].Rows)
{
foreach (DataColumn column in dataSet.Tables[0].Columns)
{
Console.Write(row[column] + " | ");
}
Console.WriteLine();
}
Console.ReadLine();
}
}
}
UPDATE
Or get rid of the attribute using System.Xml.
XmlDocument doc = new XmlDocument();
doc.Load(new StringReader(input));
foreach (XmlNode node in doc.SelectNodes("descendant-or-self::*"))
{
node.Attributes.RemoveAll();
}
input = doc.OuterXml;
But this doesn't work because the XML namespace on the last LOCATION element remains and the DataSet.LoadXml() complains that there connot be two columns named LOCATION.
Don't use string parsing. Try using some xml library (Linq has some objects that might help you). You will probably do that much more easily.
I believe that you can simply use the ADO.NET DataSet class's ReadXml method to read an XML document in that format, and it will create the DataTable, DataColumn, and DataRow objects for you. You'll need to write a little conversion method if you want to subsequently turn the DATE column's data type to DateTime. But other than that, you shouldn't have to screw around with XML at all.
Edit
I see from Daniel Bruckner's post that the LOCATION elements in the odd namespace pose a problem. Well, that's easy enough to fix:
XmlDocument d = new XmlDocument();
d.LoadXml(xml);
XmlNamespaceManager ns = new XmlNamespaceManager(d.NameTable);
ns.AddNamespace("n", "http://www.w3.org/XML/1998/namespace");
foreach (XmlNode n in d.SelectNodes("/root/Table/n:LOCATION", ns))
{
XmlElement loc = d.CreateElement("LOCATION");
n.ParentNode.AppendChild(loc);
n.ParentNode.RemoveChild(n);
}
DataSet ds = new DataSet();
using (StringReader sr = new StringReader(d.OuterXml))
{
ds.ReadXml(sr);
}
I'm not a huge fan of xml myself, I need to use it as the datasource of a grid to visualize it.
I get some output from our FileNet imaging server in xml format and I need to get pieces of it out to populate a database.
Here's what I'm doing, HTH:
Dim dsXML As DataSet
Dim drXML As DataRow
Dim rdr As System.IO.StringReader
Dim docs() As String
Dim SQL As String
Dim xml As String
Dim fnID As String
docs = _fnP8Dev.getDocumentsXML(_credToken, _docObjectStoreName, _docClass, "ReferenceNumber=" & fnID, "")
xml = docs(0)
If (InStr(xml, "<z:row") > 0) Then
RaiseEvent msg("Inserting images for reference number " & fnID)
rdr = New System.IO.StringReader(xml)
dsXML = New DataSet
dsXML.ReadXml(rdr)
For Each drXML In dsXML.Tables(dsXML.Tables.Count - 1).Rows
SQL = "Insert into fnImageP8 values ("
SQL = SQL & "'" & drXML("Id") & "', "
Try
SQL = SQL & "'" & drXML("DocumentTitle") & "', "
Catch ex As Exception
SQL = SQL & "null, "
End Try

Categories