Analysis of XML structured data - c#

My work uses software to fill out records that are expressed as XML documents. I have the task of trawling through these XML files to pull statistics out of them. The files themselves adhere to no schema and if a form field doesn't get filled out then the XML corresponding to that field is not created.
What's my best approach?
Example XML:
<Form>
<Field>
<id>Field ID</id>
<value>Entered Value</value>
</Field>
</Form>
I have been attempting to write software that I can use to query the files but have not been able to come up with anything even remotely useful.
Any thoughts appreciated.
EDIT: In terms of C#, what I would like (Though I'm sure it isn't possible) is a Dictionary that has a string as the key and the corresponding value could EITHER be a string or another Dictionary.

Is like this ↓ ?
XML:
<?xml version="1.0" encoding="utf-8" ?>
<Form>
<Field>
<id>People1</id>
<value>C Sharp</value>
</Field>
<Field>
<id>People2</id>
<value>C Sharp</value>
</Field>
<Field>
<id>People3</id>
<value>C</value>
</Field>
Source:
static void Main(string[] args)
{
var doc = XDocument.Load("test.xml");
var result = from p in doc.Descendants("Form").Descendants("Field")
select new { ID = p.Element("id").Value, VALUE = p.Element("value").Value };
foreach (var x in result)
Console.WriteLine(x);
var gr = from p in result
group p by p.VALUE into g
select new { Language=g.Key , Count=g.Count() };
foreach (var x in gr)
Console.WriteLine(string.Format("Language:{0} Count:{1}" , x.Language , x.Count));
Console.Read();
}

If the file is not too big, I would suggest perl and the XML::Simple module. This will map the XML to a perl array of hashes, and then you can simply loop through it like normal. Something like:
my $xml = XML::Simple::XmlIn( 'file.xml', force_array => [ 'Form', 'Field' ] );
my %fld_counts;
foreach my $form ( #{$xml->{Form}} )
{
# Any start record processing...
foreach my $fld ( #{$form->{Field}} )
{
my $id = $fld->{id}
my $val = $fld->{value}
# Do something with id/value... like...
$fld_counts{$id}++;
}
}
So just adjust that structure based on the stats you want to gather

For parsing XML I prefer using plain XmlReader. Granted, it's more verbose, but it's super efficient and transparent, at least for me. For example:
using(var xr = XmlReader.Create('your stream here'))
while(xr.Read())
if(xr.NodeType == XmlNodeType.Element)
switch(xr.Name) {
case "brands":
// do something here with this element,
// like possibly reading the whole subtree...
using(var xrr = xr.ReadSubtree())
while(xrr.Read()) {
// working here...
}
break;
case "products":
// that is another element
break;
case "some-other-element":
// and so on
break;
} // switch

Related

Reading XML file then sorting by name alphabetically

I am trying to read in data from an XML file, which I can do with the code below.
How would I then sort the data by name alphabetically to be used on an electronic menu app?
Could someone show me how to do this?
The code block below does work, but I have no idea how to sort the data alphabetically by node type.
For instance, I would like to sort by "name" alphabetically.
Other sorting methods could include sorting by price or category, etc.
Here is the XML data structure:
<?xml version="1.0" encoding="UTF-8" ?>
<base>
<menu>
<item>
<category></category>
<name></name>
<price></price>
<featured></featured>
<code></code>
<timestart></timestart>
<timeend></timeend>
</item>
</menu>
</base>
Here is the code:
public void XML_Get_MenuData()
{
try
{
XmlDocument myDoc2 = new XmlDocument();
myDoc2.Load("\\user\\today1.xml");
XmlNodeList itemNodes = myDoc2.GetElementsByTagName("item");
ushort i = 0;
Menu_Items = 0;
foreach (XmlNode sINode in itemNodes)
{
Category[i] = sINode["category"].InnerText;
Console.PrintLine("category: {0}", Category[i]);
ItemName[i] = sINode["name"].InnerText;
Console.PrintLine("name: {0}", ItemName[i]);
Price[i] = sINode["price"].InnerText;
Console.PrintLine("price: {0}", Price[i]);
Featured[i] = sINode["featured"].InnerText;
Console.PrintLine("featured: {0}", Featured[i]);
if (Featured[i] == "yes")
{
uFeatured[i] = 1;
}
else
{
uFeatured[i] = 0;
}
Code[i] = sINode["code"].InnerText;
Console.PrintLine("code: {0}", Code[i]);
TimeStart[i] = sINode["timestart"].InnerText;
Console.PrintLine("timestart: {0}", TimeStart[i]);
TimeEnd[i] = sINode["timeend"].InnerText;
Console.PrintLine("timeend: {0}", TimeEnd[i]);
i++;
}
Menu_Items = i;
Console.PrintLine("Menu Items: {0}", Menu_Items);
}
catch
{
Console.PrintLine("missed Menu Items: {0}");
}
}
You should look into the lovely LINQ to XML, which allows you to do stuff like this:
var Doc = XDocument.Load("PathToYourXml");
var OrderedItems = Doc.Descendants("item").OrderBy(x => x.Element("name").Value);
OrderedItems contains your objects alphabetically ordered by their name, in the shape of a bunch (bunch = IOrderedEnumerable) of XElements.
If you wanted do make the order descending, you'd just use OrderByDescending:
var OrderedItems = Doc.Descendants("item").OrderByDescending(x => x.Element("name").Value);
As a side note, right now you're iterating your items and printing them, and apparently storing the data horizontally in arrays. Mapping them to a class instead would be useful to avoid having to use magical strings like "item" and "name", which is error-prone. I'd encourage you to create a model for your items and investigate a bit on how to parse the XML into a list of your object, using something like an XmlSerializer, and maybe taking a look at questions like this, where you'll find some alternatives.

i want to fetch array of record in xml using c# Linq

this is my code to fetch xml values from files and it does successfully but single element like an in file type but when i try to fetch array of records from xml it failed
public void readXmlFiles()
{
var xml = XDocument.Load(#"C:\Applications\Files\Xmldemo.xml");
var format = from r in xml.Descendants("insureance")
select new
{
type = r.Element("type").Value,
// values=r.Element("value").Value
};
foreach (var r in format)
{
Console.WriteLine(r.values);
}
}
this is my xml file
<?xml version="1.0" encoding="UTF-8"?>
<insureance>
<type>Risk</type>
<classofbusiness type="array">
<value>0</value>
<value>1</value>
<value>2</value>
</classofbusiness>
</insureance>
now i want to fetch classofbusiness all values thanx in advance
Your current attempt selects a single value element, which doesn't exist as a child of insureance. I would guess you get a null reference exception as a result.
You need to follow the structure of the document
var format =
from ins in doc.Descendants("insureance")
select new
{
Type = (string) ins.Element("type"),
ClassOfBusiness = ins.Elements("classofbusiness")
.Elements("value")
.Select(x => (int) x)
.ToArray()
};
var format = xml.Elements("insureance").Elements("classofbusiness").Descendants();

Is my usage of LINQ to XML correct when I'm trying to find multiple values in the same XML file?

I have to extract values belonging to certain elements in an XML file and this is what I ended up with.
XDocument doc = XDocument.Load("request.xml");
var year = (string)doc.Descendants("year").FirstOrDefault();
var id = (string)doc.Descendants("id").FirstOrDefault();
I'm guessing that for each statement I'm iterating through the entire file looking for the first occurrence of the element called year/id. Is this the correct way to do this? It seems like there has to be a way where one would avoid unnecessary iterations. I know what I'm looking for and I know that the elements are going to be there even if the values may be null.
I'm thinking in the lines of a select statement with both "year" and "id" as conditions.
For clearance, I'm looking for certain elements and their respective values. There'll most likely be multiple occurrences of the same element but FirstOrDefault() is fine for that.
Further clarification:
As requested by the legend Jon Skeet, I'll try to clarify further. The XML document contains fields such as <year>2015</year> and <id>123032</id> and I need the values. I know which elements I'm looking for, and that they're going to be there. In the sample XML below, I would like to get 2015, The Emperor, something and 30.
Sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<documents xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<make>Apple</make>
<year>2015</year>
<customer>
<name>The Emperor</name>
<level2>
<information>something</information>
</level2>
<age>30</age>
</customer>
A code that doesn't parse the whole xml twice would be like:
XDocument doc = XDocument.Load("request.xml");
string year = null;
string id = null;
bool yearFound = false, idFound = false;
foreach (XElement ele in doc.Descendants())
{
if (!yearFound && ele.Name == "year")
{
year = (string)ele;
yearFound = true;
}
else if (!idFound && ele.Name == "id")
{
id = (string)ele;
idFound = true;
}
if (yearFound && idFound)
{
break;
}
}
As you can see you are trading lines of code for speed :-) I do feel the code is still quite readable.
if you really need to optimize up to the last line of code, you could put the names of the elements in two variables (because otherwise there will be many temporary XName creation)
// before the foreach
XName yearName = "year";
XName idName = "id";
and then
if (!yearFound && ele.Name == yearName)
...
if (!idFound && ele.Name == idName)

Xml Linq Find Element After another Element

I am searching for this for a long time here, but I can't get it working from other codes. I need to find the closest element to "user" (which is "robot") and write its value (it's depending on user's input). I am programming a chat-bot. This is my XML file:
<Answers>
<user>This is a message</user><robot>Here is an answer</robot>
<user>This is another message</user><robot>Here is another answer</robot>
</Answers>
In C# code i am trying something like this:
private static void Main(string[] args)
{
XDocument doc = XDocument.Load("C:\\bot.xml");
var userPms = doc.Descendants("user");
var robotPm = doc.Descendants("robot");
string userInput = Console.ReadLine();
foreach (var pm in userPms.Where(pm => pm.Value == userInput))
{
Console.WriteLine // robotPm.FindNextTo(pm)
}
}
Simply put, I want to compare "user" input in console and in xml and if they are equal write robot's answer from xml which is responsible to specified user input.
Thank you for help
Simply use NextNode
Console.WriteLine(((XElement)pm.NextNode).Value);
But don't forget: Altough I've never seen a counter-example, xml parsers do not guarantee the order of elements. a better approach would be
<item>
<q>qusetion1</q>
<a>answer1</a>
</item>
<item>
<q>qusetion2</q>
<a>answer2</a>
</item>

An algorithm using LINQ or C# to sanitize specific HTML from a string

Background Info:
I have a large body of text that I regularly encapsulate in a single string from an XML document(using LINQ). This string contains lots of HTML that I need to preserve for output purposes, but the emails and discrete HTML links that occasionally occur in this string need to be removed. An Example of the offending text looks like this:
--John Smith from Romanesque Architecture</p>
What I need to be able to do is:
Find the following string: <a href
Delete that string and all characters following it through the string >
Also, always delete this string </a>
Is there a way with LINQ that I can do this easily or am I going to have to create an algorithm using .NET string manipulation to achieve this?
You could probably do this with LINQ, but it sounds like a regular old REGEX would be much, much better.
It sounds like this question, and particularly this answer demonstrate what you're trying to do.
If you want to do this exactly via LinqToXml, try something like this recursive function:
static void ReplaceNodesWithContent(XElement element, string targetElementname)
{
if (element.Name == targetElementname)
{
element.ReplaceWith(element.Value);
return;
}
foreach (var child in element.Elements())
{
ReplaceNodesWithContent(child, targetElementname);
}
}
Usage example:
static void Main(string[] args)
{
string xml = #"<root>
<items>
<item>
<a>inner</a>
</item>
<item>
<subitem>
<a>another one</a>
</subitem>
</item>
</items>
";
XElement x = XElement.Parse(xml);
ReplaceNodesWithContent(x, "a");
string res = x.ToString();
// res == #"<root>
// <items>
// <item>inner</item>
// <item>
// <subitem>another one</subitem>
// </item>
// </items>
// </root>"
}

Categories