c# linq to xml query help - c#

Lets assume a xml file named data.xml with following content:
<root>
<record>
<id>1</id>
<name>test 1</name>
<resume>this is the resume</resume>
<specs>these are the specs</specs>
</record>
<record>
<id>2</id>
<name>test 2</name>
<resume>this is the resume 2</resume>
</record>
<record>
<id>3</id>
<name>test 3</name>
<specs>these are the specs 3</specs>
</record>
</root>
I need to search all records where any of these fields (id, name, resume or specs) contains a given value. I've created this code
XDocument DOC = XDocument.Load("data.xml");
IEnumerable<ProductRecord> results = from obj in DOC.Descendants("record")
where
obj.Element("id").Value.Contains(valueToSearch) ||
obj.Element("name").Value.Contains(valueToSearch) ||
obj.Element("resume").Value.Contains(valueToSearch) ||
obj.Element("specs").Value.Contains(valueToSearch)
select new ProductRecord {
ID = obj.Element("id").Value,
Name = obj.Element("name").Value,
Resume = obj.Element("resume").Value,
Specs = obj.Element("specs").Value
};
This code throws an error of NullReference since not all records have all fields.
How can i test if current record has a given element before i define a condition to apply? Ex. Record[#ID=3] has no resume.
Thanks in advance

You can write an extension method which is like bellow:
public static class XMLExtension
{
public static string GetValue(this XElement input)
{
if (input != null)
return input.Value;
return null;
}
public static bool XMLContains(this string input, string value)
{
if (string.IsNullOrEmpty(input))
return false;
return input.Contains(value);
}
}
and use it as below:
IEnumerable<ProductRecord> results = from obj in DOC.Descendants("record")
where
obj.Element("id").GetValue().XMLContains(valueToSearch) || ...

You are getting a NullReferenceException because you are trying to access the value of some nodes that don't exist for each record like specs. You need to check whether obj.Element("specs") != null before calling .Value on it.
As an alternative you could use XPath:
var doc = XDocument.Load("test.xml");
var records = doc.XPathSelectElements("//record[contains(id, '2') or contains(name, 'test') or contains(resume, 'res') or contains(specs, 'spe')]");

First I'm amazed that it's not crashing because you're not using the Namespace. Maybe c#4.0 has bypassed this?
Anyway try
obj.Descendants("id").Any() ? root.Element("id").Value : null
That is:
select new ProductRecord {
ID = obj.Descendants("id").Any() ? root.Element("id").Value : null,
Name = obj.Descendants("name").Any() ? root.Element("name").Value : null,
Resume = obj.Descendants("resume").Any() ? root.Element("resume").Value : null
Specs = obj.Descendants("specs").Any() ? root.Element("specs").Value : null
};

Related

Reading XML and creating a frequency count of elements in C#

I have an XML file in this format (but only much bigger) :
<customer>
<name>John</name>
<age>24</age>
<gender>M</gender>
</customer>
<customer>
<name>Keith</name>
<age></age> <!--blank value-->
<gender>M</gender>
</customer>
<customer>
<name>Jenny</name>
<age>21</age>
<gender>F</gender>
</customer>
<customer>
<name>John</name>
<age>24</age> <!--blank value-->
<gender>M</gender> <!--blank value-->
</customer>
I want to generate a DataTable which will be in this format :
Element Name   Value   Frequency name       filled     4name       blank    0age        filled     2age       blank    2gender      filled    3gender      blank    1
Currently I am completing this task in 2 parts, first creating a DataTable structure as above and setting all the frequencies to 0 as default. And then reading the XML using XmlReader, and increasing the count of the frequency everytime XmlReader finds a child element.
My problem is that the second function that I use for adding the actual count is taking too long for very big Xml files with many customers having many attributes. How can I improve the efficiency of this function? My code :
static void AddCount(DataTable dt)
{
int count;
using (XmlReader reader = XmlReader.Create(#"C:\Usr\sample.xml"))
{
while (reader.Read())
{
if (reader.IsStartElement())
{
string eleName = reader.Name;
DataRow[] foundElements = dt.Select("ElementName = '" + eleName + "'");
if (!reader.IsEmptyElement)
{
count = int.Parse(foundElements.ElementAt(0)["Frequency"].ToString());
foundElements.ElementAt(0).SetField("Frequency", count + 1);
}
else
{
count = int.Parse(foundElements.ElementAt(0)["Frequency"].ToString());
foundElements.ElementAt(0).SetField("Frequency", count + 1);
}
}
}
}
}
I am also ready to change the XmlReader class for any other more efficient class for this task. Any advice is welcome.
It turned out that querying in the DataTable using the Select operation was very expensive and that was making my function very slow.
Instead of that, used a Dictionary<string, ValueFrequencyModel> and queried on that to fill the dictionary with the count, and after completing that, converted the Dictionary<string, ValueFrequencyModel> into a DataTable.
This saved loads of time for me and solved the problem.
You can use following code:
using (XDocument xdoc = XDocument.Load(#"C:\Users\aks\Desktop\sample.xml"))
{
var customers = xdoc.Descendants("customer");
var totalNodes = customers.Count();
var filledNames = customers.Descendants("name").Where(x => x.Value != string.Empty).Count();
var filledAges = customers.Descendants("age").Where(x => x.Value != string.Empty).Count();
var filledGenders = customers.Descendants("gender").Where(x => x.Value != string.Empty).Count();
var unfilledNames = totalNodes - filledNames;
var unfilledAges = totalNodes - filledAges;
var unfilledGenders = totalNodes - filledGenders;
}
Try this logic, currently I have only taken only one attribute here ie Name
XDocument xl = XDocument.Load(#"C:\Usr\sample.xml");
var customers = xl.Descendants("Customer");
var customerCount = customers.Count();
var filledCustomers = customers.Where(x => x.Element("Name").Value != string.Empty).Count();
var nonfilledCustomers = customerCount - filledCustomers;

Is my usage of LINQ to XML correct when I'm trying to find multiple values in the same XML file?

I have to extract values belonging to certain elements in an XML file and this is what I ended up with.
XDocument doc = XDocument.Load("request.xml");
var year = (string)doc.Descendants("year").FirstOrDefault();
var id = (string)doc.Descendants("id").FirstOrDefault();
I'm guessing that for each statement I'm iterating through the entire file looking for the first occurrence of the element called year/id. Is this the correct way to do this? It seems like there has to be a way where one would avoid unnecessary iterations. I know what I'm looking for and I know that the elements are going to be there even if the values may be null.
I'm thinking in the lines of a select statement with both "year" and "id" as conditions.
For clearance, I'm looking for certain elements and their respective values. There'll most likely be multiple occurrences of the same element but FirstOrDefault() is fine for that.
Further clarification:
As requested by the legend Jon Skeet, I'll try to clarify further. The XML document contains fields such as <year>2015</year> and <id>123032</id> and I need the values. I know which elements I'm looking for, and that they're going to be there. In the sample XML below, I would like to get 2015, The Emperor, something and 30.
Sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<documents xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<make>Apple</make>
<year>2015</year>
<customer>
<name>The Emperor</name>
<level2>
<information>something</information>
</level2>
<age>30</age>
</customer>
A code that doesn't parse the whole xml twice would be like:
XDocument doc = XDocument.Load("request.xml");
string year = null;
string id = null;
bool yearFound = false, idFound = false;
foreach (XElement ele in doc.Descendants())
{
if (!yearFound && ele.Name == "year")
{
year = (string)ele;
yearFound = true;
}
else if (!idFound && ele.Name == "id")
{
id = (string)ele;
idFound = true;
}
if (yearFound && idFound)
{
break;
}
}
As you can see you are trading lines of code for speed :-) I do feel the code is still quite readable.
if you really need to optimize up to the last line of code, you could put the names of the elements in two variables (because otherwise there will be many temporary XName creation)
// before the foreach
XName yearName = "year";
XName idName = "id";
and then
if (!yearFound && ele.Name == yearName)
...
if (!idFound && ele.Name == idName)

Extracting XML inner node elements in C#

I have an XML document that looks like this:
<root>
<key>
<id>v1</id>
<val>v2</val>
<iv>v3</iv>
</key>
</root>
How do I extract the v2 values and v3 values of a key node using its v1 value in C#?
Use Linq.
var myXml = XDocument.Parse("<root>
<key>
<id>v1</id>
<val>v2</val>
<iv>v3</iv>
</key>
</root>").Root.Elements("key")
.FirstOrDefault(x=> x.Element("id").Value == value);
if (myXml != null)
{
var myObject = new
{
id = myXml.Element("id").Value,
val = myXml.Element("val").Value,
iv = myXml.Element("iv").Value
});
}
Of course, you need to check for missing elements, etc, if required.
Use xpath:
/root/key[id='v1']/val
/root/key[id='v1']/iv
so something like
myXmlDoc.SelectSingleNode("/root/key[id='v1']/val").Value
myXmlDoc.SelectSingleNode("/root/key[id='v1']/iv").Value
I like using LINQ to XML for processing XML:
var xml = XElement.Parse(#"<root>
<key>
<id>v1</id>
<val>v2</val>
<iv>v3</iv>
</key>
</root>");
var key = xml.Elements("key").First(x => x.Element("id").Value == "v1");
Console.WriteLine("val: " + key.Element("val").Value);
Console.WriteLine(" iv: " + key.Element("iv").Value);
I have ignored all error checking for brevity.
For example First() would throw an exception if the element is not found. You might want to use FirstOrDefault() and check for null if you are expecting that or handle edge cases a bit more gracefully.
Same goes for Element() calls. They might return null so calling .Value could result in a System.NullReferenceException. To avoid clutter I usually use extension methods to do these checks:
static class XElementUtilities
{
public static string GetValue(this XElement xml, string name)
{
var element = xml.Element(name);
return element == null ? null : element.Value;
}
public static bool ValueEqual(this XElement xml, string name, string value)
{
var element = xml.Element(name);
return element != null && value != null && element.Value == value;
}
}

Comparing and matching XML nodes in C#

I am trying to parse this XML document and match the guid with the link nodes. I have a GUI built in C# that will allow the user to input the guid, and I am trying to spit out the corresponding link node that corresponds with it.
For example. A user enters ID: 8385522 and the program would spit out:
http://once.www.example.com
The XML is as follows:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">
<channel>
<title>
</title>
<link>
</link>
<description>
</description>
<language>
</language>
<lastBuildDate>
</lastBuildDate>
<item>
<title>Parsing Example</title>
<link>http://once.www.example.com</link>
<pubDate>Sun, 16 Sep 2012 02:44:02 </pubDate>
<guid>8385522</guid>
</item>
<item>
<title>Parsing Example 2</title>
<link>http://once.once.www.example2.com</link>
<pubDate>Sat, 29 Sep 2012 18:29:13 </pubDate>
<guid>8439191</guid>
</item>
</channel>
</rss>
I don't have any code written for the text box that the ID is being entered in. All I have in that field is:
void TextBox1TextChanged(object sender, EventArgs e)
{
}
Do I need put the function inside the text box field? Any help is appreciated.
Edit: Here is what I have so far:
private void button2_Click_1(object sender, EventArgs e)
{
Clipboard.Clear();
if (Directory.Exists(#"c:\text"))
{
XmlDocument xDoc = new XmlDocument();
xDoc.Load(#"c:\text\text.xml");
XmlDocument lDoc = new XmlDocument();
lDoc.Load(#"c:\text\text.xml");
XmlNodeList ctextbox = xDoc.GetElementsByTagName("guid");
XmlNodeList link = lDoc.GetElementsByTagName("link");
I'm not sure exactly where the parsing function needs to be.
var links = from item in xdoc.Descendants("item")
where (int)item.Element("guid") == yourGuid
select (string)item.Element("link");
But comprehension syntax does not have keyword for selecting single value, thus you need to do links.SingleOrDefault(); to get your link.
Or you can do search with fluent API:
XDocument xdoc = XDocument.Load(#"c:\text\text.xml");
int guid = 8385522; // parse your guid from textbox
string link = xdoc.Descendants("item")
.Where(item => (int)item.Element("guid") == guid)
.Select(item => (string)item.Element("link"))
.SingleOrDefault();
If it is possible for searching some guid, which is not in file (looks like your case, because guid comes from textbox), then:
XDocument xdoc = XDocument.Load(#"c:\text\text.xml");
int guid = 8385525; // parse your guid from textbox
var links = from item in xdoc.Descendants("item")
where (int)item.Element("guid") == guid
select (string)item.Element("link");
string link = links.SingleOrDefault();
string link = GetLink(#"c:\text\text.xml", "8385522");
--
string GetLink(string xmlFile,string guid)
{
var xDoc = XDocument.Load(xmlFile);
var item = xDoc.Descendants("item")
.FirstOrDefault(x => (string)x.Element("guid") == guid);
if (item == null) return null;
return (string)item.Element("link");
}
I'm using this xml library, but you can use the XPath provided with .Net by including System.Linq.XPath I believe. (I can't check at the moment if that is 100% accurate).
XElement root = XElement.Load(file);
XElement guid = root.XPathElement("//guid[.={0}]", id);
XElement link = null;
if(null != guid)
link = guid.Parent.Element("link");

Parsing xml. return one object not collection

I have a xml file:
<Result>Ok</Result>
<Error></Error>
<Remark></Remark>
<Data>
<Movies>
<Movie ID='2'>
<Name><![CDATA[TestName]]></Name>
<Duration Duration='170'>2h 50min</Duration>
<Properties>
<Property Name='11'><![CDATA[1111110]]></Property>
</Properties>
<Rental from_date='' to_date=''>
<SessionCount></SessionCount>
<PU_NUMBER></PU_NUMBER>
</Rental>
</Movie>
</Movies>
</Data>
</XML>
Code for pasring xml file:
var results = from element in XDocument.Parse(queryResponse).Descendants("Movie")
select new BaseEvent
{
OID = (int)element.Attribute("ID"),
Subject = (string)element.Element("Name"),
Duration = (int)element.Element("Duration").Attribute("Duration")
};
The problem in that Descedants retruns IEumerable<BaseEvent> but I want that will be BaseEvent. How can I do this?
Just use First(), Last(), Single(), FirstOrDefault() etc.
Personally I'd do that initially, rather than doing it all in a query:
var element = XDocument.Parse(queryResponse)
.Descendants("Movie")
.FirstOrDefault();
if (element == null)
{
// Handle the case of no movies
}
else
{
var baseEvent = new BaseEvent
{
OID = (int) element.Attribute("ID"),
Subject = (string) element.Element("Name"),
Duration = (int) element.Element("Duration")
.Attribute("Duration")
};
// Use baseEvent
}
Add a .First() to get the first element:
from element in XDocument.Parse(queryResponse).Descendants("Movie")
select new BaseEvent
{
OID = (int)element.Attribute("ID"),
Subject = (string)element.Element("Name"),
Duration = (int)element.Element("Duration").Attribute("Duration")
}.First();
You can alternatively use FirstOrDefault() (in case there are no such nodes), Last() or Single().

Categories