XML to List - Unable to add all nodes to list - c#

I have two XML documents:
XmlDocument languagesXML = new XmlDocument();
languagesXML.LoadXml(
#"<languages>
<language>
<name>English</name>
<country>8</country>
<country>9</country>
<country>3</country>
<country>12</country>
</language>
<language>
<name>French</name>
<country>1</country>
<country>3</country>
<country>7</country>
<country>13</country>
</language>
</languages>");
XmlDocument productsXML = new XmlDocument();
productsXML.LoadXml(#"<products>
<product>
<name>Screws</name>
<country>3</country>
<country>12</country>
<country>29</country>
</product>
<product>
<name>Hammers</name>
<country>1</country>
<country>13</country>
</product>
</products>");
I am trying to add the relative information, such as name and country of each language and product, to a list as I want to compare the two and group the languages that correspond to a certain language. For example, taking the above into account, my goal is to have an output similar to this:
Screws -> English, French
Hammers -> French
English and French correspond to Screws as they all share a common country value. Same with Hammers. (The above XML is just a snapshot of the entire XML).
I have tried using How to read a XML file and write into List<>? and XML to String List. While this piece of code:
var languages = new List<string>();
XmlNode xmlNode;
foreach(var node in languagesXML.LastChild.FirstChild.ChildNodes)
{
xmlNode = node as XmlNode;
languages.Add(xmlNode.InnerXml);
}
languages.ForEach(Console.WriteLine);
works, it will only add "English", "8", "9", "3", and "12" to the list. The rest of the document seems to be ignored. Is there a better way of doing what I'm trying to achieve? Would I even be able to compare and attain an output like what I need even if I got everything adding to a list? Would Muenchian grouping be something I should be looking at?

This is a job for LINQ to XML. Eg
using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
namespace ConsoleApp18
{
static class EnumerableUtils
{
public static HashSet<T> ToHashSet<T>(this IEnumerable<T> col)
{
return new HashSet<T>(col);
}
}
class Program
{
static void Main(string[] args)
{
XDocument languagesXML = XDocument.Parse(
#"<languages>
<language>
<name>English</name>
<country>8</country>
<country>9</country>
<country>3</country>
<country>12</country>
</language>
<language>
<name>French</name>
<country>1</country>
<country>3</country>
<country>7</country>
<country>13</country>
</language>
</languages>");
var languages = languagesXML.Root
.Elements("language")
.Select(e =>
new
{
Name = (string)e.Element("name"),
Countries = e.Elements("country").Select(c => (int)c).ToHashSet()
})
.ToList();
XDocument productsXML = XDocument.Parse(#"<products>
<product>
<name>Screws</name>
<country>3</country>
<country>12</country>
<country>29</country>
</product>
<product>
<name>Hammers</name>
<country>1</country>
<country>13</country>
</product>
</products>");
var products = productsXML.Root
.Elements("product")
.Select(e =>
new
{
Name = (string)e.Element("name"),
Countries = e.Elements("country").Select(c => (int)c).ToHashSet()
})
.ToList();
var q = from p in products
from l in languages
where p.Countries.Overlaps(l.Countries)
let pl = new { p, l, }
group pl by p.Name into byProductName
select new
{
ProductName = byProductName.Key,
Languages = byProductName.Select(e => e.l.Name).ToList()
};
foreach (var p in q.ToList())
{
Console.WriteLine($"Product: {p.ProductName} is available in languages: {String.Join(",", p.Languages.ToArray())}");
}
}
}
}
outputs
Product Screws is available in languages English,French
Product Hammers is available in languages French

Related

How to handle missing field on XML elements

I'm trying to convert a XML file to a list. The XML file contains different products, and each product has different values, e.g.:
<product>
<id></id>
<name>Commentarii de Bello Gallico et Civili</name>
<price>449</price>
<type>Book</type>
<author>Gaius Julius Caesar</author>
<genre>Historia</genre>
<format>Inbunden</format>
</product>
<product>
<id></id>
<name>Katana Zero</name>
<price>199</price>
<type>Game</type>
<platform>PC, Switch</platform>
</product>
The problem is that some elements does not have all fields, some books can look like this for example:
<product>
<id></id>
<name>House of Leaves</name>
<price>49</price>
<type>Bok</type>
<author>Mark Z. Danielewski</author>
<genre>Romance</genre>
</product>
When I try adding these elements to the list, it works until I get an element that does not have all fields. When that happens, I get "System.NullReferenceException: 'Object reference not set to an instance of an object'."
List<Product> products= new List<Product>();
XElement xelement = XElement.Load(path);
IEnumerable<XElement> pr = xelement.Elements();
foreach (var p in pr)
{
switch (p.Element("type").Value)
{
case "Book":
temp.Add(new Book(1, int.Parse(employee.Element("price").Value),
0 ,
p.Element("name").Value,
p.Element("author").Value,
p.Element("genre").Value,
p.Element("format").Value");
break;
}
What I would like is to get a null value or "Not specified" when that happens, but I don't know how to do that in a good way. All I can think of are try catch for each variable but that seems uneccesary complicated.
How can I handle these cases in a good way?
Use a null check - ?
p.Element("name")?.Value,
p.Element("author")?.Value,
p.Element("genre")?.Value,
p.Element("format")?.Value");
I usually use a nested dictionary :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication186
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
XDocument doc = XDocument.Load(FILENAME);
Dictionary<string, Dictionary<string, string>> dict = doc.Descendants("product")
.GroupBy(x => (string)x.Element("name"), y => y.Elements()
.GroupBy(a => a.Name.LocalName, b => (string)b)
.ToDictionary(a => a.Key, b => b.FirstOrDefault()))
.ToDictionary(x => x.Key, y => y.FirstOrDefault());
}
}
}

Select Next Sibling if Element in previous sibling is a certain value XML

I have the following XML file
<ProdExtract>
<Product>
<Barcode>
<Eancode>0000000000000</Eancode>
</Barcode>
<Barcode>
<Eancode>0000000000000</Eancode>
</Barcode>
<Barcode>
<Eancode>5391524344444</Eancode>
</Barcode>
</Product>
<Product>
<Barcode>
<Eancode>5391524322222</Eancode>
</Barcode>
</Product>
</ProdExtract>
The desired ouput is to get the Eancode of both products.
For the first product, to skip all the zero barcode and select/ouput the next valid Eancode (5391524344444) for that product. For the second product output that valid default Eancode (5391524322222).
By default this is how I am extracting the Eancode for a product that only has one Barcode node
p.q_barcode = node.SelectSingleNode("Barcode/Eancode").FirstChild.InnerText;
Issue is dealing with the product that has multiple invalid zero value Eancodes before extracting the next valid one.
With current code
XmlDocument xDoc = new XmlDocument();
xDoc.Load(data);
XmlElement root = xDoc.DocumentElement;
XmlNodeList nodes = root.SelectNodes("Product");
foreach (XmlNode node in nodes)
{
var noneZeroEancode = xml.Descendants("Eancode").Where(x => x.Value != "0000000000000");
string outputString = eancode.First().Value;
p.q_barcode = outputString;
//other properties....
p.q_description = node.SelectSingleNode("ProductID/LongDescription").InnerText;
//insert record into table
//go read next "Product" node (if any)
}
Output result: on reading the second node, it is assigning value of first Eancode to the second product. So all products that follow have the Eancode value of 5391524344444 instead of their own unique value.
Additional Info (Just for clarity):
This functionality is within a loop that reads each product node, maps the values to table columns and imports the record/product, then moves on to read the next node. This part needs no answer as I have that solved.
Try something like this:
using System;
using System.Linq;
using System.Xml.Linq;
namespace xml {
class Program {
static void Main(string[] args) {
var data =
#"<ProdExtract>
<Product>
<Barcode>
<Eancode>0000000000000</Eancode>
</Barcode>
</Product>
<Product>
<Barcode>
<Eancode>0000000000000</Eancode>
</Barcode>
</Product>
<Product>
<Barcode>
<Eancode>5391524344444</Eancode>
</Barcode>
</Product>
</ProdExtract>";
var xml = XDocument.Parse(data);
var eancode = xml.Descendants("Eancode").Where(x => x.Value != "0000000000000");
var product = eancode.Select(x => x.Parent.Parent);
foreach (var p in product) {
Console.WriteLine(p);
}
}
}
}
The first step is to select elements that don't have the 00...0 Eancode value. Next look at each of the returned nodes (XElements) and get the grandparent.
Possible issue:
If your XML isn't formatted as expected (if 'Eancode' exists but isn't within a 'Product/Barcode') you would not get the expected results, you could even generate an exception getting the parent of the parent.

List is empty after parsing XML with LinQ

I have an xml file similar to the following:
<doc>
<file>
<header>
<source>
RNG
</source>
</header>
<body>
<item name="items.names.id1">
<property>propertyvalue1</property>
</item>
<!-- etc -->
<item name="items.names.id100">
<property>propertyvalue100</property>
</item>
<!-- etc -->
<item name="otheritems.names.id100">
<property>propertyvalue100</property>
</item>
</body>
</file>
</doc>
And the following class:
private class Item
{
public string Id;
public string Property;
}
The file has, for example, 100 item entries (labeled 1 to 100 in the name attribute). How can I use Linq Xml to get hold of these nodes and place them a in list of item?
Using Selman22's example, I'm doing the following:
var myList = xDoc.Descendants("item")
.Where(x => x.Attributes("name").ToString().StartsWith("items.names.id"))
.Select(item => new Item
{
Id = (string)item.Attribute("name"),
Name = (string)item.Element("property")
}).ToList();
However, the list is empty. What am I missing here?
Using LINQ to XML:
XDocument xDoc = XDocument.Load(filepath);
var myList = xDoc.Descendants("item").Select(item => new Item {
Id = (string)item.Attribute("name"),
Property = (string)item.Element("property")
}).ToList();
You can use LinqToXml to directly query the XML, or deserialize it and use LINQ to object. If you choose to deserialize I suggest to start from the schema and generate the classes representing your datamodel with xsd.exe. If you don't have the schema of your xml, even xsd.exe can infer one from an example xml file, but you probably need to fine tune the result.
Try this one XElement root = XElement.Parse("your file name");
var items textSegs =(from item in root.Descendants("item")
select item).ToList();
Now iterate over list and store it
The below is a way of getting information from xml using Xdocument.
string input = "<Your xml>";
Xdocument doc = XDocument.Parse(input);
var data = doc.Descendants("item");
List<Items> itemsList = new List<Items>();
foreach(var item in data)
{
string itemname= item.Element("item").Value;
string property = item.Element("property").Value;
itemsList.Add(new item(itemname, property));
}
I'm guessing you want the code given how your question is phrased.. also I'm assuming the real XML is very simplistic as well.
var items = from item in doc.Descendants("item")
select new Item()
{
Id = item.Attributes("name").First().Value,
Property = item.Elements().First().Value,
};
Just ensure that your xml is loaded into doc. You can load the xml in two ways:
// By a string with xml
var doc = XDocument.Parse(aStringWithXml);
// or by loading from uri (file)
var doc = XDocuemnt.Load(aStringWhichIsAFile);

Filter XDocument more efficiently

I would like to filter with high performance XML elements from an XML document.
Take for instance this XML file with contacts:
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="asistentes.xslt"?>
<contactlist evento="Cena Navidad 2010" empresa="company">
<contact type="1" id="1">
<name>Name1</name>
<email>xxxx#zzzz.es</email>
<confirmado>SI</confirmado>
</contact>
<contact type="1" id="2">
<name>Name2</name>
<email>xxxxxxxxx#zzzze.es</email>
<confirmado>Sin confirmar</confirmado>
</contact>
</contaclist>
My current code to filter from this XML document:
using System;
using System.Xml.Linq;
class Test
{
static void Main()
{
string xml = #" the xml above";
XDocument doc = XDocument.Parse(xml);
foreach (XElement element in doc.Descendants("contact")) {
Console.WriteLine(element);
var id = element.Attribute("id").Value;
var valor = element.Descendants("confirmado").ToList()[0].Value;
var email = element.Descendants("email").ToList()[0].Value;
var name = element.Descendants("name").ToList()[0].Value;
if (valor.ToString() == "SI") { }
}
}
}
What would be the best way to optimize this code to filter on <confirmado> element content?
var doc = XDocument.Parse(xml);
var query = from contact in doc.Root.Elements("contact")
let confirmado = (string)contact.Element("confirmado")
where confirmado == "SI"
select new
{
Id = (int)contact.Attribute("id"),
Name = (string)contact.Element("name"),
Email = (string)contact.Element("email"),
Valor = confirmado
};
foreach (var contact in query)
{
...
}
Points of interest:
doc.Root.Elements("contact") selects only the <contact> elements in the document root, instead of searching the whole document for <contact> elements.
The XElement.Element method returns the first child element with the given name. No need to convert the child elements to a list and take the first element.
The XElement and XAttribute classes provide a wide selection of convenient conversion operators.
You could use LINQ:
foreach (XElement element in doc.Descendants("contact").Where(c => c.Element("confirmado").Value == "SI"))

"where" query using linq xml

been taxing my brain trying to figure out how to perform a linq xml query.
i'd like the query to return a list of all the "product" items where the category/name = "First Category" in the following xml
<catalog>
<category>
<name>First Category</name>
<order>0</order>
<product>
<name>First Product</name>
<order>0</order>
</product>
<product>
<name>3 Product</name>
<order>2</order>
</product>
<product>
<name>2 Product</name>
<order>1</order>
</product>
</category>
</catalog>
Like so:
XDocument doc = XDocument.Parse(xml);
var qry = from cat in doc.Root.Elements("category")
where (string)cat.Element("name") == "First Category"
from prod in cat.Elements("product")
select prod;
or perhaps with an anonymous type too:
XDocument doc = XDocument.Parse(xml);
var qry = from cat in doc.Root.Elements("category")
where (string)cat.Element("name") == "First Category"
from prod in cat.Elements("product")
select new
{
Name = (string)prod.Element("name"),
Order = (int)prod.Element("order")
};
foreach (var prod in qry)
{
Console.WriteLine("{0}: {1}", prod.Order, prod.Name);
}
Here's an example:
string xml = #"your XML";
XDocument doc = XDocument.Parse(xml);
var products = from category in doc.Element("catalog").Elements("category")
where category.Element("name").Value == "First Category"
from product in category.Elements("product")
select new
{
Name = product.Element("name").Value,
Order = product.Element("order").Value
};
foreach (var item in products)
{
Console.WriteLine("Name: {0} Order: {1}", item.Name, item.Order);
}
You want to use the Single extension method here. Try the following:
var category = doc.RootNode.Elements("category").Single(
c => c.Attribute("name").Value == "First Category");
var products = category.Elements("product");
Note that this assumes you only have one category with name "First Category". If you possibly have more, I recommend using Marc's solution; otherwise, this should be the more appropiate/efficient solution. Also, this will throw an exception if any category node doesn't have a name child node. Otherwise, it should do exactly what you want.

Categories