How to remove xml duplicates in c#

How to remove xml duplicates in c# - c#

Please could someone help me? I have researched other posts (such as efficiently removing duplicate xml elements in c#) on how to remove duplicates in XML using c# and altered them to solve my problem all to no avail. I'm not very experienced in XML and all I wish to do is remove the duplicates from the following XML.
I've inherited this code and can't change the structure.
Many thanks to anyone that can help.
<Request>
<Type>Delete</Type>
<Client>
<ClientId></ClientId>
<Assignment>
<AssignmentId></AssignmentId>
<Assessments>
<AssessmentId>664449ba-21b9-e511-999d-d8fc934939fe</AssessmentId>
<AssessmentId>5ea8edd4-e1b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>5ea8edd4-e1b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>865a13f8-e1b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>865a13f8-e1b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>06439800-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>06439800-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>f683aa08-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>f683aa08-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>063f8012-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>063f8012-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>16f7c329-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>16f7c329-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>76706838-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>76706838-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>86194741-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>86194741-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>66cf984f-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>66cf984f-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
</Assessments>
</Assignment>
</Client>
</Request>

If you can change the application building the XML (it sounds ilke you can't), my preferred method would be to use a HashSet<string> to build up the Asssesments collection. If it's coming off a SQL query, use DISTINCT or GROUP BY.
If you're working with the XML itself and really just have no way to change it, LINQ to XML should work with a custom IEqualityComparer should work:
string xml = #"<Request>
<Type>Delete</Type>
<Client>
<ClientId></ClientId>
<Assignment>
<AssignmentId></AssignmentId>
<Assessments>
<AssessmentId>664449ba-21b9-e511-999d-d8fc934939fe</AssessmentId>
<AssessmentId>5ea8edd4-e1b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>5ea8edd4-e1b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>865a13f8-e1b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>865a13f8-e1b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>06439800-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>06439800-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>f683aa08-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>f683aa08-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>063f8012-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>063f8012-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>16f7c329-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>16f7c329-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>76706838-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>76706838-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>86194741-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>86194741-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>66cf984f-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
<AssessmentId>66cf984f-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
</Assessments>
</Assignment>
</Client>
</Request>";
XDocument xd = XDocument.Parse(xml);
var assessments = xd.Root.Element("Client")
.Element("Assignment")
.Element("Assessments");
// get the distinct ones
var distinctEls = assessments.Elements()
.Distinct(new XElComparer())
.ToList(); // ensure we actually get the list, not just the enumerator or elements we're about to remove
// remove all children
assessments.Elements().Remove();
// add back our distinct list
assessments.Add(distinctEls);
Console.WriteLine(xd);
Console.ReadKey();
and the XElComparer:
public class XElComparer : IEqualityComparer<XElement>
{
public bool Equals(XElement x, XElement y)
{
return x.Value.Equals(y.Value);
}
public int GetHashCode(XElement obj)
{
if (obj == null) return 0;
return obj.Value.GetHashCode();
}
}

I prefer to work with c# objects.
So, you can deserialize this xml to objects with xml serializer. Also you can generate c# classes by xml in visual studio: Edit-> PasteSpecial-> Paste xml as classes.
Your code will look like this:
Request request;
var fileName = "File1.xml";
//Parsing
var sr = new XmlSerializer(typeof(Request));
using (var fs = new FileStream(fileName, FileMode.Open))
{
request = (Request)sr.Deserialize(fs);
}
//Selecting distinct C# logic
var distinctAssignments = request.Client.Assignment.Assessments.Distinct();
request.Client.Assignment.Assessments = distinctAssignments.ToArray();
//Saving your document
var xmlDocument = new XmlDocument();
using (var stream = new MemoryStream())
{
sr.Serialize(stream, request);
stream.Position = 0;
xmlDocument.Load(stream);
xmlDocument.Save(fileName);
stream.Close();
}
Also you can use XSLT but it will look bit complex - https://msdn.microsoft.com/en-us/library/bb399419(v=vs.110).aspx

You can do this with a simple (or not so simple I guess) XPath query.
XmlDocument doc = new XmlDocument();
doc.LoadFrom(xml); // xml in string form
var nodes = doc.SelectNodes("//AssessmentId[not(. = preceding-sibling::AssessmentId)]");
That will get you a list of unique assignment ID nodes which you can then use to remove all the existing nodes and add those. You could also remove the 'not' in the XPath query and then you would get a list of the duplicates which you could remove those nodes from the parent node as well.

Related

How to use LINQ to XML when XML parent and child nodes have the same name

I am trying to extract some SQL data to XML from a Microsoft Dynamics environment, I am currently using LINQ To XML in C# to read and write to my XML files. One piece of data I need is from a view called SECURITYSUBROLE. Looking at the structure of this view shows that there is a column also named SECURITYSUBROLE. My normal method of extraction has given me this XML.
<SECURITYSUBROLE>
<SECURITYROLE>886301</SECURITYROLE>
<SECURITYSUBROLE>886317</SECURITYSUBROLE>
<VALIDFROM>1900-01-01T00:00:00-06:00</VALIDFROM>
<VALIDFROMTZID>0</VALIDFROMTZID>
<VALIDTO>1900-01-01T00:00:00-06:00</VALIDTO>
<VALIDTOTZID>0</VALIDTOTZID>
<RECVERSION>1</RECVERSION>
<RECID>886317</RECID>
</SECURITYSUBROLE>
When I try to import this data later on, I am getting errors because the parent XML node has the same name as a child node. Here is a snippet of the import method:
XmlReaderSettings settings = new XmlReaderSettings();
settings.CheckCharacters = false;
XmlReader reader = XmlReader.Create(path, settings);
reader.MoveToContent();
int count = 1;
List<XElement> xmlSubset = new List<XElement>();
while (reader.ReadToFollowing(xmlTag))
{
if (count % 1000 == 0)
{
xmlSubset.Add(XElement.Load(reader.ReadSubtree()));
XDocument xmlTemp = new XDocument(new XElement(xmlTag));
foreach (XElement elem in xmlSubset)
{
xmlTemp.Root.Add(elem);
}
xmlSubset = new List<XElement>();
ImportTableByName(connectionString, tableName, xmlTemp);
count = 1;
}
else
{
xmlSubset.Add(XElement.Load(reader.ReadSubtree()));
count++;
}
}
}
It's currently failing on the XmlReader.ReadToFollowing, where it doesn't know where to go next because of the name confusion. So my question has two parts:
1) Is there some better way to be extracting this data other than to XML?
2) Is there a way through LINQ To XML that I can somehow differentiate between the parent and child nodes named exactly the same?

To get the elements (in your case) for SECURITYSUBROLE you can check to see if the element's have children:
XElement root = XElement.Load(path);
var subroles = root.Descendants("SECURITYSUBROLE") // all subroles
.Where(x => !x.HasElements); // only subroles without children

I'm going to suggest a different approach:
1) VS2013 (possibly earlier versions too) has a function to create a class from an XML source. So get one of your XML files and copy the content to your clipboard. Then in a new class file Edit --> Paste Special --> Paste XML as Classes
2) Look into XmlSerialization which will allow you to convert an XML file into an in memory object with a strongly typed class.
XmlSerializer s = new XmlSerializer(yourNewClassTYPE);
TextReader r = new StreamReader(XmlFileLocation);
var dataFromYourXmlAsAStronglyTypedClass = (yourNewlyClassTYPE) s.Deserialize(r);
r.Close();

How do I parse this xml?

I need to parse the following xml code in c# using system.xml. I need a list of strings containing the content of the tags User.
<Configuration>
....
<DebugUsersMail>
<User>bob#example.com</User>
<User>lenny#example.com</User>
</DebugUsersMail>
...
</Configuration>

If you can use Linq something like this is nice and simple
XDocument xmlDoc = XDocument.Load("C:\\your_xml_file.xml");
List<string> users = xmlDoc.Descendants("User").Select(xElem => (string)xElem).ToList();
You'll need to include a reference to System.Xml.Linq in your using statements to use the XDocument object.
This does however assume that there are no other User elements in the xml file that you don't want included in the list.
If you want to be more specific you could do this
List<string> users = xmlDoc.Descendants("DebugUsersMail")
.Descendants("User").Select(xElem => (string)xElem).ToList();

I found a solution:
List<string> returnList = new List<string>();
XmlNodeList node = xmlDocument.GetElementsByTagName("DebugUsersMail");
XmlNodeList childNodes = node[0].ChildNodes;
for(int i = 0; i < childNodes.Count; i++)
{
returnList.Add(childNodes[i].InnerText);
}
return returnList;

There are tons of ways to do it in C#. You can use:
XmlDocument andthen XPath and XQuery
XDocument and Linq
XmlTextReader/SAX
Regular expressions
Deserialise the XML to objects
The route to take depends a lot of what the rest of XML looks like.

How to read XML attributes in C#?

I have several XML files that I wish to read attributes from. My main objective is to apply syntax highlighting to rich text box.
For example in one of my XML docs I have: <Keyword name="using">[..] All the files have the same element: Keyword.
So, how can I get the value for the attribute name and put them in a collection of strings for each XML file.
I am using Visual C# 2008.

The other answers will do the job - but the syntax highlighting thingy and the several xml files you say you have makes me thinks you need something faster, why not use a lean and mean XmlReader?
private string[] getNames(string fileName)
{
XmlReader xmlReader = XmlReader.Create(fileName);
List<string> names = new List<string>();
while (xmlReader.Read())
{
//keep reading until we see your element
if (xmlReader.Name.Equals("Keyword") && (xmlReader.NodeType == XmlNodeType.Element))
{
// get attribute from the Xml element here
string name = xmlReader.GetAttribute("name");
// --> now **add to collection** - or whatever
names.Add(name);
}
}
return names.ToArray();
}
Another good option would be the XPathNavigator class - which is faster than XmlDoc and you can use XPath.
Also I would suggest to go with this approach only IFF after you try with the straightforward options you're not happy with performance.

You could use XPath to get all the elements, then a LINQ query to get the values on all the name atttributes you find:
XDocument doc = yourDocument;
var nodes = from element in doc.XPathSelectElements("//Keyword")
let att = element.Attribute("name")
where att != null
select att.Value;
string[] names = nodes.ToArray();
The //Keyword XPath expression means, "all elements in the document, named "Keyword".
Edit: Just saw that you only want elements named Keyword. Updated the code sample.

Like others, I would suggest using LINQ to XML - but I don't think there's much need to use XPath here. Here's a simple method to return all the keyword names within a file:
static IEnumerable<string> GetKeywordNames(string file)
{
return XDocument.Load(file)
.Descendants("Keyword")
.Attributes("name")
.Select(attr => attr.Value);
}
Nice and declarative :)
Note that if you're going to want to use the result more than once, you should call ToList() or ToArray() on it, otherwise it'll reload the file each time. Of course you could change the method to return List<string> or string[] by -adding the relevant call to the end of the chain of method calls, e.g.
static List<string> GetKeywordNames(string file)
{
return XDocument.Load(file)
.Descendants("Keyword")
.Attributes("name")
.Select(attr => attr.Value)
.ToList();
}
Also note that this just gives you the names - I would have expected you to want the other details of the elements, in which case you'd probably want something slightly different. If it turns out you need more, please let us know.

You could use LINQ to XML.
Example:
var xmlFile = XDocument.Load(someFile);
var query = from item in xmlFile.Descendants("childobject")
where !String.IsNullOrEmpty(item.Attribute("using")
select new
{
AttributeValue = item.Attribute("using").Value
};

You'll likely want to use XPath. //Keyword/#name should get you all of the keyword names.
Here's a good introduction: .Net and XML XPath Queries

**<Countries>
<Country name ="ANDORRA">
<state>Andorra (general)</state>
<state>Andorra</state>
</Country>
<Country name ="United Arab Emirates">
<state>Abu Z¸aby</state>
<state>Umm al Qaywayn</state>
</Country>**
public void datass(string file)
{
string file = HttpContext.Current.Server.MapPath("~/App_Data/CS.xml");
XmlDocument doc = new XmlDocument();
if (System.IO.File.Exists(file))
{
//Load the XML File
doc.Load(file);
}
//Get the root element
XmlElement root = doc.DocumentElement;
XmlNodeList subroot = root.SelectNodes("Country");
for (int i = 0; i < subroot.Count; i++)
{
XmlNode elem = subroot.Item(i);
string attrVal = elem.Attributes["name"].Value;
Response.Write(attrVal);
XmlNodeList sub = elem.SelectNodes("state");
for (int j = 0; j < sub.Count; j++)
{
XmlNode elem1 = sub.Item(j);
Response.Write(elem1.InnerText);
}
}
}

How to Parse XML file in c# (youtube api result)?

I'm trying to parse XML returned from the Youtue API. The APIcalls work correctly and creates an XmlDocument. I can get an XmlNodeList of the "entry" tags, but I'm not sure how to get the elements inside such as the , , etc...
XmlDocument xmlDoc = youtubeService.GetSearchResults(search.Term, "published", 1, 50);
XmlNodeList listNodes = xmlDoc.GetElementsByTagName("entry");
foreach (XmlNode node in listNodes)
{
//not sure how to get elements in here
}
The XML document schema is shown here: http://code.google.com/apis/youtube/2.0/developers_guide_protocol_understanding_video_feeds.html
I know that node.Attributes is the wrong call, but am not sure what the correct one is?
By the way, if there is a better way (faster, less memory) to do this by serializing it or using linq, I'd be happy to use that instead.
Thanks for any help!

Here some examples reading the XmlDocument. I don't know whats fastest or what needs less memory - but i would prefer Linq To Xml because of its clearness.
XmlDocument xmlDoc = youtubeService.GetSearchResults(search.Term, "published", 1, 50);
XmlNodeList listNodes = xmlDoc.GetElementsByTagName("entry");
foreach (XmlNode node in listNodes)
{
// get child nodes
foreach (XmlNode childNode in node.ChildNodes)
{
}
// get specific child nodes
XPathNavigator navigator = node.CreateNavigator();
XPathNodeIterator iterator = navigator.Select(/* xpath selector according to the elements/attributes you need */);
while (iterator.MoveNext())
{
// f.e. iterator.Current.GetAttribute(), iterator.Current.Name and iterator.Current.Value available here
}
}
and the linq to xml one:
XmlDocument xmlDoc = youtubeService.GetSearchResults(search.Term, "published", 1, 50);
XDocument xDoc = XDocument.Parse(xmlDoc.OuterXml);
var entries = from entry in xDoc.Descendants("entry")
select new
{
Id = entry.Element("id").Value,
Categories = entry.Elements("category").Select(c => c.Value)
};
foreach (var entry in entries)
{
// entry.Id and entry.Categories available here
}

I realise this has been answered and LINQ to XML is what I'd go with but another option would be XPathNavigator. Something like
XPathNavigator xmlNav = xmlDoc.CreateNavigator();
XPathNodeIterator xmlitr = xmlNav.Select("/XPath/expression/here")
while (xmlItr.MoveNext()) ...
The code is off the top of my head so it may be wrong and there may be a better way with XPathNavigator but it should give you the general idea

You could use XSD.exe to generate a class based on the schema provided. Once generated, you could then parse the XML response into the strongly typed class.
string xmlResponse = GetMyYouTubeStuff();
MyYouTubeClass response = null;
XmlHelper<MyYouTubeClass> xmlHelper = new XmlHelper<MyYouTubeClass>();
response = xmlHelper.Deserialize(xmlResponse);
And the class for deserializing it...
public class XmlHelper<T>
{
public T Deserialize(string xml)
{
XmlSerializer xs = new XmlSerializer(typeof(T));
Byte[] byteArray = new UTF8Encoding().GetBytes(xml);
MemoryStream memoryStream = new MemoryStream(byteArray);
XmlTextReader xmlTextReader = new XmlTextReader(memoryStream);
T retObj = (T)xs.Deserialize(xmlTextReader);
return retObj;
}
}
There's also another way here.

C# how can I get all elements name from a xml file

I'd like to get all the element name from a xml file, for example the xml file is,
<BookStore>
<BookStoreInfo>
<Address />
<Tel />
<Fax />
<BookStoreInfo>
<Book>
<BookName />
<ISBN />
<PublishDate />
</Book>
<Book>
....
</Book>
</BookStore>
I would like to get the element's name of "BookName". "ISBN" and "PublishDate " and only those names, not include " BookStoreInfo" and its child node's name
I tried several ways, but doesn't work, how can I do it?

Well, with XDocument and LINQ-to-XML:
foreach(var name in doc.Root.DescendantNodes().OfType<XElement>()
.Select(x => x.Name).Distinct())
{
Console.WriteLine(name);
}
There are lots of similar routes, though.

Using XPath
XmlDocument xdoc = new XmlDocument();
xdoc.Load(something);
XmlNodeList list = xdoc.SelectNodes("//BookStore");
gives you a list with all nodes in the document named BookStore

I agree with Adam, the ideal condition is to have a schema that defines the content of xml document. However, sometimes this is not possible. Here is a simple method for iterating all of the nodes of an xml document and using a dictionary to store the unique local names. I like to keep track of the depth of each local name, so I use a list of int to store the depth. Note that the XmlReader is "easy on the memory" since it does not load the entire document as the XmlDocument does. In some instances it makes little difference because the size of the xml data is small. In the following example, an 18.5MB file is read with an XmlReader. Using an XmlDocument to load this data would have been less effecient than using an XmlReader to read and sample its contents.
string documentPath = #"C:\Docs\cim_schema_2.18.1-Final-XMLAll\all_classes.xml";
Dictionary<string, List<int>> nodeTable = new Dictionary<string, List<int>>();
using (XmlReader reader = XmlReader.Create(documentPath))
{
while (!reader.EOF)
{
if (reader.NodeType == XmlNodeType.Element)
{
if (!nodeTable.ContainsKey(reader.LocalName))
{
nodeTable.Add(reader.LocalName, new List<int>(new int[] { reader.Depth }));
}
else if (!nodeTable[reader.LocalName].Contains(reader.Depth))
{
nodeTable[reader.LocalName].Add(reader.Depth);
}
}
reader.Read();
}
}
Console.WriteLine("The node table has {0} items.",nodeTable.Count);
foreach (KeyValuePair<string, List<int>> kv in nodeTable)
{
Console.WriteLine("{0} [{1}]",kv.Key, kv.Value.Count);
for (int i = 0; i < kv.Value.Count; i++)
{
if (i < kv.Value.Count-1)
{
Console.Write("{0}, ", kv.Value[i]);
}
else
{
Console.WriteLine(kv.Value[i]);
}
}
}

The purists way of doing this (and, to be fair, the right way) would be to have a schema contract definition and read it in that way. That being said, you could do something like this...
List<string> nodeNames = new List<string>();
foreach(System.Xml.XmlNode node in doc.SelectNodes("BookStore/Book"))
{
foreach(System.Xml.XmlNode child in node.Children)
{
if(!nodeNames.Contains(child.Name)) nodeNames.Add(child.Name);
}
}
This is, admittedly, a rudimentary method for obtaining the list of distinct node names for the Book node's children, but you didn't specify much else in the way of your environment (if you have 3.5, you could use LINQ to XML to make this a little prettier, for example), but this should get the job done regardless of your environment.

If you're using C# 3.0, you can do the following:
var data = XElement.Load("c:/test.xml"); // change this to reflect location of your xml file
var allElementNames =
(from e in in data.Descendants()
select e.Name).Distinct();

You can try doing it using XPATH.
XmlDocument doc = new XmlDocument();
doc.LoadXml("xml string");
XmlNodeList list = doc.SelectNodes("//BookStore/Book");

If BookStore is ur root element then u can try following code
XmlDocument doc = new XmlDocument();
doc.Load(configPath);
XmlNodeList list = doc.DocumentElement.GetElementsByTagName("Book");
if (list.Count != 0)
{
for (int i = 0; i < list[0].ChildNodes.Count; i++)
{
XmlNode child = list[0].ChildNodes[i];
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to remove xml duplicates in c# - c#

Related

How to use LINQ to XML when XML parent and child nodes have the same name

How do I parse this xml?

How to read XML attributes in C#?

How to Parse XML file in c# (youtube api result)?

C# how can I get all elements name from a xml file

Categories

Resources