Removing duplicate elements in XML

Removing duplicate elements in XML - c#

My project requires a functionality to convert the input XML file into DataTable.
I am using the following code to do that.
DataSet ds = new DataSet();
ds.Locale = CultureInfo.InvariantCulture;
dataSourceFileStream.Seek(0, SeekOrigin.Begin);
ds.ReadXml(dataSourceFileStream);
dt = ds.Tables[0];
This works quiet right unless the input XML has duplicate elements, for eg, if the XML file is like below:
<?xml version="1.0" encoding="iso-8859-1"?>
<DocumentElement>
<data>
<DATE>27 September 2013</DATE>
<SCHEME>Test Scheme Name</SCHEME>
<NAME>Mr John</NAME>
<SCHEME>Test Scheme Name</SCHEME>
<TYPE>1</TYPE>
</data>
</DocumentElement>
As you can see above, the element SCHEME appears twice. when this kind of XML file comes ds.ReadXml(dataSourceFileStream); fails to return right data table.
Any better way to handle this?

Looks like you have to fix the XML first. You can do this by using the XDocument and associated classes. But first you need to create a EqualityComparer which compares two XElements based on their name:
public class MyEqualityComparer : IEqualityComparer<XElement>
{
public bool Equals(XElement x, XElement y)
{
return x.Name == y.Name;
}
public int GetHashCode(XElement obj)
{
return obj.Name.GetHashCode();
}
}
Now try this:
var comparer = new MyEqualityComparer();
XDocument.Load(dataSourceFileStream);
var doc = XDocument.Parse(data);
var dataElements = doc.Element("DocumentElement").Elements("data");
foreach (var dataElement in dataElements)
{
var childElements = dataElement.Elements();
var distinctElements = childElements.Distinct(comparer).ToArray();
if (distinctElements.Length != childElements.Count())
{
dataElement.Elements().Remove();
foreach (var item in distinctElements)
dataElement.Add(item);
}
}
using (var stream = new MemoryStream())
{
var writer = new StreamWriter(stream);
doc.Save(writer);
stream.Seek(0, 0);
var ds = new DataSet();
ds.Locale = CultureInfo.InvariantCulture;
var mode = ds.ReadXml(stream);
var dt = ds.Tables[0];
}
That would be a quick workaround to your problem. But i strongly suggest to encourage the data provider to fix the XML

Okay. as stated in my previous comment, you can create your own XmlTextReader which patches/ignores some elements. The idea is, that this reader checks if he has already read an element within the same depth. If it is the case, advance to the end element.
class MyXmlReaderPatcher : XmlTextReader
{
private readonly HashSet<string> _currentNodeElementNames = new HashSet<string>();
public MyXmlReaderPatcher(TextReader reader) : base(reader)
{ }
public override bool Read()
{
var result = base.Read();
if (this.Depth == 1)
{
_currentNodeElementNames.Clear();
}
else if (this.Depth==2 && this.NodeType == XmlNodeType.Element)
{
if (_currentNodeElementNames.Contains(this.Name))
{
var name = this.Name;
do {
result = base.Read();
if (result == false)
return false;
} while (this.NodeType != XmlNodeType.EndElement && this.Name != name);
result = this.Read();
}
else
{
_currentNodeElementNames.Add(this.Name);
}
}
return result;
}
}
All you have to do is to link the new reader in between your ds.ReadXml() and your file stream:
var myReader = new MyXmlReaderPatcher(dataSourceFileStream);
var ds = new DataSet();
ds.Locale = CultureInfo.InvariantCulture;
var mode = ds.ReadXml(myReader);
var dt = ds.Tables[0];

Related

C# WCF Client - DataTable to XmlElement

I have
an old WCF SOAP service from my server,
an .NET Framework application.
an .NET Framework library.
I want to upgrade my library first to netstandard2.0.
Everything works well, i can regenerate WCF Client files.
However, DataTable have changed to ...TableResult with XmlElement.
So, i know how to change XmlElement to DataTable, but how do I change DataTable to XmlElement?
public static class Transform
{
public static DataTable ToDataTable(XmlElement xmlElement)
{
using var reader = new XmlNodeReader(xmlElement);
var datatable = new DataTable();
datatable.ReadXml(reader);
return datatable;
}
public static XmlElement ToXmlElement(DataTable datatable)
{
throw new NotImplementedException();
}
}

You have to use GroupBy to group the rows, then select the parts you want into XElements.
Here is an example:
var xml = new XElement(table.TableName, table.Rows.Cast<DataRow>()
.GroupBy(row => (string)row[0])
.Select(g =>
new XElement(table.Columns[0].ColumnName,
new XElement("label", g.Key),
g.GroupBy(row => (string)row[1])
.Select(g1 =>
new XElement(table.Columns[1].ColumnName,
new XElement("label", g1.Key),
new XElement(table.Columns[2].ColumnName,
g1.Select(row =>
new XElement("label", (string)row[2])
)
)
)
)
)
)
)
or you can use dataset
DataSet ds = new DataSet();
ds.Tables.Add(table);
XmlDocument XMLDoc = new XmlDocument();
Console.WriteLine(ds.GetXml().ToString());
// In your case:
return XMLDoc.DocumentElement;

You may use ds.Write.xml, this will have a Stream to put the output into. If you need it, try the method below:
public static class Extensions
{
public static string ToXml(this DataSet ds)
{
using (var memoryStream = new MemoryStream())
{
using (TextWriter streamWriter = new StreamWriter(memoryStream))
{
var xmlSerializer = new XmlSerializer(typeof(DataSet));
xmlSerializer.Serialize(streamWriter, ds);
return Encoding.UTF8.GetString(memoryStream.ToArray());
}
}
}
}
USAGE:
var xmlString = ds.ToXml();
Response.Write(ds.ToXml());
And you can check the docs for help.

Need help retrieving XML data using Linq

I am trying to retrieve data from an XML file and return the parsed data in a list. Depending on what I use to access the data (Element or Attributes) I either get null (in case of Element) or something I cannot decipher (in case of Attributes).
XML Looks like this:
<DATA_RESPONSE>
<HEADER>
<MSGID>IS20101P:091317125610:98::34:0</MSGID>
</HEADER>
<DATA>
<ROW ID='IS20101P' PE_NAME='APP-029' PE_ID='4' CODE='4829' DATA='5,1,500,1' />
<ROW ID='IS20101P' PE_NAME='APPS-029' PE_ID='4' CODE='4829' DATA='4,1,500,1' />
...
</DATA>
<SUMMARY>
</SUMMARY>
<ERRORS>
</ERRORS>
</DATA_RESPONSE>
I am using the following to get the data. I read the file and store XML in a string and call a method with this string as argument:
public static Hashtable GetIDSData(string sXMLString)
{
Hashtable result = new Hashtable();
result.Add("Success", false);
result.Add("ErrorMessage", "");
result.Add("ID", "");
result.Add("PE_NAME", "");
result.Add("PE_ID", "");
result.Add("CODE", "");
result.Add("DATA", "");
xmlDoc.InnerXml = sXMLString;
XmlElement root = xmlDoc.DocumentElement;
XDocument doc = XDocument.Parse(sXMLString);
XmlNode node = xmlDoc.SelectSingleNode("DATA_RESPONSE/DATA");
if (node != null)
{
var AddressInfoList = doc.Root.Descendants("ROW").Select(Address => new
{
ID = Address.Attributes("ID")?.ToString(),
PEName = Address.Attributes("PE_NAME")?.ToString(),
PEID = Address.Attributes("PE_ID")?.ToString(),
Code = Address.Attributes("CODE")?.ToString(),
Data = Address.Attributes("DATA")?.ToString(),
}).ToList();
foreach (var AddressInfo in AddressInfoList)
{
if (string.IsNullOrEmpty(AddressInfo.Code))
{
result["Success"] = false;
result["ErrorMessage"] = "Invalid Code; code is empty.";
}
else
{
result["Success"] = true;
result["ErrorMessage"] = "";
result["ID"] = AddressInfo.ID;
result["PE_NAME"] = AddressInfo.PEName;
result["PE_ID"] = AddressInfo.PEID;
result["CODE"] = AddressInfo.Code;
result["DATA"] = AddressInfo.Data;
}
}
return result;
}
In Linq section, if I use Address.Element("ID").Value, I get null returned.
There is no namespace used in XML.

First off, the GetIDSData() method does not compile as is, because at the line xmlDoc.InnerXml = sXMLString, xmlDoc has not been defined.
I'm assuming you want xmlDoc to be an XmlDocument loaded with the contents of the sXMLString parameter, so I'm changing that line to:
XmlDocument xmlDoc = new XmlDocument {InnerXml = sXMLString};
Also, your root variable is never used, so I removed it for clarity.
Now as for the main part of your question, given your current syntax, you are calling .ToString() on a collection of attributes, which is obviously not what you want. To fix this, when you're iterating the AddressInfoList, You want to fetch the attribute values like:
ID = Address.Attributes("ID")?.Single().Value
or
ID = address.Attribute("ID")?.Value
...rather than Address.Attributes("ID")?.ToString() as you have above.

You are not selecting values of attributes. In your code you are selecting attributes. Not sure what are you trying to achieve, but here is my modified version of your code that loads all elements into DataTable
public static DataTable GetIDSData(string sXMLString)
{
DataTable result = new DataTable();
result.Columns.Add("Success");
result.Columns.Add("ErrorMessage");
result.Columns.Add("ID");
result.Columns.Add("PE_NAME");
result.Columns.Add("PE_ID");
result.Columns.Add("CODE");
result.Columns.Add("DATA");
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.InnerXml = sXMLString;
XmlElement root = xmlDoc.DocumentElement;
XDocument doc = XDocument.Parse(sXMLString);
XmlNode node = xmlDoc.SelectSingleNode("DATA_RESPONSE/DATA");
if (node != null)
{
var AddressInfoList = doc.Root.Descendants("ROW").Select(Address => new
{
ID = Address.Attributes("ID").Select(i=>i.Value) ,
PEName = Address.Attributes("PE_NAME").Select(i=>i.Value),
PEID = Address.Attributes("PE_ID").Select(i=>i.Value),
Code = Address.Attributes("CODE").Select(i=>i.Value),
Data = Address.Attributes("DATA").Select(i=>i.Value),
}).ToList();
AddressInfoList.ForEach(e =>
{
e.Code.ToList().ForEach(c =>
{
DataRow row = result.NewRow();
if (!string.IsNullOrEmpty(c))
{
row["Success"] = true;
row["ErrorMessage"] = "";
row["ID"] = e.ID.First();
row["PE_NAME"] = e.PEName.First();
row["PE_ID"] = e.PEID.First();
row["CODE"] = e.Code.First();
row["DATA"] = e.Data.First();
}
else
{
row["Success"] = false;
row["ErrorMessage"] = "Invalid Code; code is empty.";
}
result.Rows.Add(row);
});});
result.Dump();
return result;
}
return result;
}
And this is the result that you will get in your datatable.

ID = Address.Attributes("ID")?.ToString(),
You want to use Attribute(name) (without s) instead:
ID = Address.Attributes("ID")?.Value,

C# Reading CSV file with SQL conditions

I am using CsvHelper lib to read CSV file and I can successfully read the file with the lib. However I cannot use SQL condition to filter values. How can I do that without using SQL Server. I am really stuck on it.
It was very easy with Pandas and Pandasql libs in Python but it is being too hard in C#..
My Code:
public static void Main(string[] args)
{
var fileInfo = new FileInfo(#"filePath");
using (TextReader reader = fileInfo.OpenText())
using (var csvReader = new CsvReader(reader))
{
csvReader.Configuration.Delimiter = ",";
csvReader.Configuration.HasHeaderRecord = false;
csvReader.Configuration.IgnoreQuotes = true;
csvReader.Configuration.TrimFields = true;
csvReader.Configuration.WillThrowOnMissingField = false;
while (csvReader.Read())
{
var myStrinVar = csvReader.GetField<string>(0);
Console.Write(myStrinVar); //SELECT * FROM table...
}
}
}

I would suggest using LINQ to filter your results.
https://msdn.microsoft.com/en-us/library/bb397906.aspx
Say you have some class MyClass that you can serialize the lines in your file into.
For example:
public class MyClass
{
public int ID { get; set; }
}
var records = csv.GetRecords<MyClass>().ToList();
var filtered = records.Where(r => r.ID >= 10);
That example is a bit contrived but you can use any boolean expression you like in the where clause.

I know this is too late for OP, but the issue with the accepted answer is that you have to read in the entire result set to memory which may not be tenable for large files. Also, if you can extend this code below to get the top N rows without having to read the entire CSV if you find matches early in the file.
public static void Main(string[] args)
{
var fileInfo = new FileInfo(#"filePath");
var where = ""; //Code to set up where clause part of query goes here
using (TextReader reader = fileInfo.OpenText())
using (var csvReader = new CsvReader(reader))
{
csvReader.Configuration.Delimiter = ",";
csvReader.Configuration.HasHeaderRecord = false;
csvReader.Configuration.IgnoreQuotes = true;
csvReader.Configuration.TrimFields = true;
csvReader.Configuration.WillThrowOnMissingField = false;
DataTable dt = null;
while (csvReader.Read())
{
//Use the first row to initialize the columns.
if (dt == null)
{
dt = new DataTable();
for (var i = 0; i < csvReader.FieldCount; i++)
{
var fieldType = csvReader.GetFieldType(i);
DataColumn dc;
if (fieldType.IsNullableType())
{
dc = new DataColumn(csvReader.GetName(i), Nullable.GetUnderlyingType(fieldType));
dc.AllowDBNull = true;
}
else
dc = new DataColumn(csvReader.GetName(i), data.GetFieldType(i));
dt.Columns.Add(dc);
}
}
//Map DataReader to DataRow
var newRow = dt.Rows.Add();
foreach(DataColumn col in dt.Columns)
{
newRow[col.ColumnName] = csvReader[col.ColumnName];
}
//Create a temporary DataView and filter it with the where clause.
DataView dv = new DataView(dt);
dv.RowFilter = where;
var data = dv.Count > 0 ? dv[0] : null;
if(data != null)
{
//Row in here matches your where clause.
//Code to read this row or do something with it.
}
//Empty the temporary data table.
dt.Rows.Clear();
}
}
}

C# Deserialize XML with multiple XML declarations

A third part supplier has given us XML that is not well formed. It has multiple XML declarations:
<?xml version="1.0" encoding="utf-16"?><!DOCTYPE bob />
<?xml version="1.0"?><!DOCTYPE jim>
<elements>
...
</elements
My de-serializing code:
var serializer = new XmlSerializer(response.GetType());
reader = new XmlTextReader(stream) {XmlResolver = null};
result = (IResponse) serializer.Deserialize(reader);
The problem I am having is the de-serializer complains about the the multiple XML declarations. Is there any way I can strip the declarations off so that the XML successfully de-serializes?
Thanks

You could write a wrapper around a XmlReader, which filters out subsequent xml processing instructions and doctypes.
public class XmlFilteringReader : XmlReader
{
private readonly XmlReader _source;
private bool _gotXmlDeclaration = false;
private bool _gotDoctype = false;
public XmlFilteringReader(XmlReader source)
{
_source = source;
}
public override bool Read()
{
var ok = _source.Read();
if (ok && _source.NodeType == XmlNodeType.ProcessingInstruction
&& _source.LocalName == "xml")
{
if (_gotXmlDeclaration) return Read(); // Recursive
_gotXmlDeclaration = true;
}
else if (ok && _source.NodeType == XmlNodeType.DocumentType)
{
if (_gotDoctype) return Read(); // Recursive
_gotDoctype = true;
}
return ok;
}
// Implementation of other methods and properties
// by calling the same method or property on _source
}
var serializer = new XmlSerializer(response.GetType());
var reader = new XmlFilteringReader(new XmlTextReader(stream) {XmlResolver = null});
var result = (IResponse) serializer.Deserialize(reader);
The implementation could be simplified by using XmlWrappingReader from the Mvp.Xml library. There is also a blog-post about this.

Retrieving Data From XML File

I seem to be having a problem with retrieving XML values with C#, which I know it is due to my very limited knowledge of C# and .XML.
I was given the following XML file
<PowerBuilderRunTimes>
<PowerBuilderRunTime>
<Version>12</Version>
<Files>
<File>EasySoap110.dll</File>
<File>exPat110.dll</File>
<File>pbacc110.dll</File>
</File>
</PowerBuilderRunTime>
</PowerBuilderRunTimes>
I am to process the XML file and make sure that each of the files in the exist in the folder (that's the easy part). It's the processing of the XML file that I have having a hard time with. Here is what I have done thus far:
var runtimeXml = File.ReadAllText(string.Format("{0}\\{1}", configPath, Resource.PBRuntimes));
var doc = XDocument.Parse(runtimeXml);
var topElement = doc.Element("PowerBuilderRunTimes");
var elements = topElement.Elements("PowerBuilderRunTime");
foreach (XElement section in elements)
{
//pbVersion is grabbed earlier. It is the version of PowerBuilder
if( section.Element("Version").Value.Equals(string.Format("{0}", pbVersion ) ) )
{
var files = section.Elements("Files");
var fileList = new List<string>();
foreach (XElement area in files)
{
fileList.Add(area.Element("File").Value);
}
}
}
My issue is that the String List is only ever populated with one value, "EasySoap110.dll", and everything else is ignored. Can someone please help me, as I am at a loss.

Look at this bit:
var files = section.Elements("Files");
var fileList = new List<string>();
foreach (XElement area in files)
{
fileList.Add(area.Element("File").Value);
}
You're iterating over each Files element, and then finding the first File element within it. There's only one Files element - you need to be iterating over the File elements within that.
However, there are definitely better ways of doing this. For example:
var doc = XDocument.Load(Path.Combine(configPath, Resource.PBRuntimes));
var fileList = (from runtime in doc.Root.Elements("PowerBuilderRunTime")
where (int) runtime.Element("Version") == pbVersion
from file in runtime.Element("Files").Elements("File")
select file.Value)
.ToList();
Note that if there are multiple matching PowerBuilderRunTime elements, that will create a list with all the files of all those elements. That may not be what you want. For example, you might want:
var doc = XDocument.Load(Path.Combine(configPath, Resource.PBRuntimes));
var runtime = doc.Root
.Elements("PowerBuilderRunTime")
.Where(r => (int) r.Element("Version") == pbVersion)
.Single();
var fileList = runtime.Element("Files")
.Elements("File")
.Select(x => x.Value)
.ToList();
That will validate that there's exactly one matching runtime.

The problem is, there's only one element in your XML, with multiple children. You foreach loop only executes once, for the single element, not for its children.
Do something like this:
var fileSet = files.Elements("File");
foreach (var file in fileSet) {
fileList.Add(file.Value);
}
which loops over all children elements.

I always preferred using readers for reading homegrown XML config files. If you're only doing this once it's probably over kill, but readers are faster and cheaper.
public static class PowerBuilderConfigParser
{
public static IList<PowerBuilderConfig> ReadConfigFile(String path)
{
IList<PowerBuilderConfig> configs = new List<PowerBuilderConfig>();
using (FileStream stream = new FileStream(path, FileMode.Open))
{
XmlReader reader = XmlReader.Create(stream);
reader.ReadToDescendant("PowerBuilderRunTime");
do
{
PowerBuilderConfig config = new PowerBuilderConfig();
ReadVersionNumber(config, reader);
ReadFiles(config, reader);
configs.Add(config);
reader.ReadToNextSibling("PowerBuilderRunTime");
} while (reader.ReadToNextSibling("PowerBuilderRunTime"));
}
return configs;
}
private static void ReadVersionNumber(PowerBuilderConfig config, XmlReader reader)
{
reader.ReadToDescendant("Version");
string version = reader.ReadString();
Int32 versionNumber;
if (Int32.TryParse(version, out versionNumber))
{
config.Version = versionNumber;
}
}
private static void ReadFiles(PowerBuilderConfig config, XmlReader reader)
{
reader.ReadToNextSibling("Files");
reader.ReadToDescendant("File");
do
{
string file = reader.ReadString();
if (!string.IsNullOrEmpty(file))
{
config.AddConfigFile(file);
}
} while (reader.ReadToNextSibling("File"));
}
}
public class PowerBuilderConfig
{
private Int32 _version;
private readonly IList<String> _files;
public PowerBuilderConfig()
{
_files = new List<string>();
}
public Int32 Version
{
get { return _version; }
set { _version = value; }
}
public ReadOnlyCollection<String> Files
{
get { return new ReadOnlyCollection<String>(_files); }
}
public void AddConfigFile(String fileName)
{
_files.Add(fileName);
}
}

Another way is to use a XmlSerializer.
[Serializable]
[XmlRoot]
public class PowerBuilderRunTime
{
[XmlElement]
public string Version {get;set;}
[XmlArrayItem("File")]
public string[] Files {get;set;}
public static PowerBuilderRunTime[] Load(string fileName)
{
PowerBuilderRunTime[] runtimes;
using (var fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
var reader = new XmlTextReader(fs);
runtimes = (PowerBuilderRunTime[])new XmlSerializer(typeof(PowerBuilderRunTime[])).Deserialize(reader);
}
return runtimes;
}
}
You can get all the runtimes strongly typed, and use each PowerBuilderRunTime's Files property to loop through all the string file names.
var runtimes = PowerBuilderRunTime.Load(string.Format("{0}\\{1}", configPath, Resource.PBRuntimes));

You should try replacing this stuff with a simple XPath query.
string configPath;
System.Xml.XPath.XPathDocument xpd = new System.Xml.XPath.XPathDocument(cofigPath);
System.Xml.XPath.XPathNavigator xpn = xpd.CreateNavigator();
System.Xml.XPath.XPathExpression exp = xpn.Compile(#"/PowerBuilderRunTimes/PwerBuilderRunTime/Files//File");
System.Xml.XPath.XPathNodeIterator iterator = xpn.Select(exp);
while (iterator.MoveNext())
{
System.Xml.XPath.XPathNavigator nav2 = iterator.Current.Clone();
//access value with nav2.value
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Removing duplicate elements in XML - c#

Related

C# WCF Client - DataTable to XmlElement

Need help retrieving XML data using Linq

C# Reading CSV file with SQL conditions

C# Deserialize XML with multiple XML declarations

Retrieving Data From XML File

Categories

Resources