XmlReader skipping adjoined elements - c#

Whilst trying to minimise the memory footprint of an XML parsing program, specifically avoiding the loading hundreds of megabytes using XElement.Load(), I came across articles suggesting using the older XmlReader e.g. here.
I need to internally reconstruct each major element as an XElement to avoid major refactoring. However, I discovered that if my source elements are directly adjoining, this approach skips every 2nd element.
I've torn down the problem to this unit-test (MSTest2 with FluentAssertions):
[DataTestMethod]
[DataRow("<data><entry>1</entry><entry>2</entry><entry>3</entry><entry>4</entry></data>")]
[DataRow("<data><entry>1</entry> <entry>2</entry> <entry>3</entry> <entry>4</entry></data>")]
public void XmlReaderCount(string input)
{
var sr = new StringReader(input);
var xml = XmlReader.Create(sr);
xml.MoveToContent();
var data = new List<string>();
while (xml.Read())
{
if (xml.LocalName == "entry" && xml.NodeType == XmlNodeType.Element)
{
var element = (XElement)System.Xml.Linq.XNode.ReadFrom(xml);
data.Add(element.Value);
}
}
data.Should()
.HaveCount(4);
}
The first (data-driven) test fails with:
Expected collection to contain 4 item(s), but found 2.
As it puts 1 and 3 into the data collection. It does loop 4 times, but every other element has an xml.NodeType of Text, not Element. The second test (with spaces between the </entry> and <entry> passes by processing all 4.
In my real world example, I can't easily change the source. I do already have a solution, inspired by another StackOverflow question so I can do the following, but it seems strange - is something wrong?
[DataTestMethod]
[DataRow("<data><entry>1</entry><entry>2</entry><entry>3</entry><entry>4</entry></data>")]
[DataRow("<data><entry>1</entry> <entry>2</entry> <entry>3</entry> <entry>4</entry></data>")]
public void XmlReaderCountSubtree(string input)
{
var data = new List<string>();
var sr = new StringReader(input);
var xml = XmlReader.Create(sr);
xml.MoveToContent();
while (xml.Read())
{
if (xml.LocalName == "entry" && xml.NodeType == XmlNodeType.Element)
{
using (var subtree = xml.ReadSubtree())
{
subtree.MoveToContent();
var content = subtree.ReadOuterXml();
var element = XElement.Parse(content);
data.Add(element.Value);
}
}
}
data.Should()
.HaveCount(4);
}

When you call ReadFrom(xml) , the state of xml is changed. Its cursor is moved forward to the next element. Your code then moves on to while (xml.Read()) and so ignores that new element completely.
With the second data set, the ignored (and uninspected) elements are the whitespace nodes so you get away with it. But basically, you reading algorithm is wrong.
A fix for your first approach, not pretty but it works:
xml.Read();
while (! xml.EOF)
{
if (xml.LocalName == "entry" && xml.NodeType == XmlNodeType.Element)
{
//using (var subtree = xml.ReadSubtree())
{
var element = (XElement)XNode.ReadFrom(xml);
data.Add(element.Value);
}
}
else
{
xml.Read();
}
}

Related

Insert child element in the right order when working with OpenXML

I'm modifying .docx documents with DocumentFormat.OpenXml library. I know element ordering is important, otherwise the document will not pass schema validation and might result a document that can't be opened in Word.
Now I need to add a DocumentProtection element to DocumentSettingsPart. And I need to insert this child element in the right place inside of a parent.
The schema looks like this:
There are quite a lot of possible ordering of child elements. At the moment I'm adding this element like this:
var documentProtection = new DocumentProtection()
{
// do the configuration
};
DocumentSettingsPart settings = doc.MainDocumentPart.DocumentSettingsPart;
var rootElement = settings.RootElement;
var prevElement =
rootElement.GetFirstChild<DoNotTrackFormatting>() ??
rootElement.GetFirstChild<DoNotTrackMoves>() ??
rootElement.GetFirstChild<TrackRevisions>() ??
rootElement.GetFirstChild<RevisionView>() ??
rootElement.GetFirstChild<DocumentType>() ??
rootElement.GetFirstChild<StylePaneSortMethods>() ??
// SNIP
rootElement.GetFirstChild<Zoom>() ??
rootElement.GetFirstChild<View>() ??
(OpenXmlLeafElement)rootElement.GetFirstChild<WriteProtection>();
rootElement.InsertAfter(documentProtection, prevElement);
I.e. I'm trying to find if any possible element that should go before mine already exists in the document. And then insert DocumentProtection after that element. And given amount of elements this list gets pretty boring.
Is there a better way to add DocumentProtection so it is schema compliant and does not involve enumeration of all possible elements?
There isn't a nice way to achieve what you want. You'll have to tinker with the collection and you're responsible for keeping the order correct.
Using ILSpy on the Settings class you'll find that the implementors used a helper method SetElement<T> on the base class that takes a position and an instance to insert.
Unfortunately that helper method is marked internal so we can't leverage it if you try to subclass Settings. Instead I re-implemented the needed functionality so you'll have a subclass of Settings that does offer a property for DocumentProtection but uses the re-implemented solution to find the correct location to insert the node:
SettingsExt
public class SettingsExt: Settings
{
// contruct based on XML
public SettingsExt(string outerXml)
:base(outerXml)
{
// empty
}
public DocumentProtection DocumentProtection
{
// get is easy
get
{
return this.GetFirstChild<DocumentProtection>();
}
// reimplemented SetElement based on
// reversed engineered Settings class
set
{
// eleTagNames is a static string[] declared later
// it holds all the names of the elements in the right order
int sequenceNumber = eleTagNames
.Select((s, i) => new { s= s, idx = i })
.Where(s => s.s == "documentProtection")
.Select((s) => s.idx)
.First();
OpenXmlElement openXmlElement = this.FirstChild;
OpenXmlElement refChild = null;
while (openXmlElement != null)
{
// a bit naive
int currentSequence = eleTagNames
.Select((s, i) => new { s = s, idx = i })
.Where(s => s.s == openXmlElement.LocalName)
.Select((s) => s.idx)
.First(); ;
if (currentSequence == sequenceNumber)
{
if (openXmlElement is DocumentProtection)
{
refChild = openXmlElement.PreviousSibling();
this.RemoveChild<OpenXmlElement>(openXmlElement);
break;
}
refChild = openXmlElement;
}
else
{
if (currentSequence > sequenceNumber)
{
break;
}
refChild = openXmlElement;
}
openXmlElement = openXmlElement.NextSibling();
}
if (value != null)
{
this.InsertAfter(value, refChild);
}
}
}
// order of elements in the sequence!
static readonly string[] eleTagNames = new string[]
{
"writeProtection",
"view",
"zoom",
"removePersonalInformation",
"removeDateAndTime",
"doNotDisplayPageBoundaries",
"displayBackgroundShape",
"printPostScriptOverText",
"printFractionalCharacterWidth",
"printFormsData",
"embedTrueTypeFonts",
"embedSystemFonts",
"saveSubsetFonts",
"saveFormsData",
"mirrorMargins",
"alignBordersAndEdges",
"bordersDoNotSurroundHeader",
"bordersDoNotSurroundFooter",
"gutterAtTop",
"hideSpellingErrors",
"hideGrammaticalErrors",
"activeWritingStyle",
"proofState",
"formsDesign",
"attachedTemplate",
"linkStyles",
"stylePaneFormatFilter",
"stylePaneSortMethod",
"documentType",
"mailMerge",
"revisionView",
"trackRevisions",
"doNotTrackMoves",
"doNotTrackFormatting",
"documentProtection",
"autoFormatOverride",
"styleLockTheme",
"styleLockQFSet",
"defaultTabStop",
"autoHyphenation",
"consecutiveHyphenLimit",
"hyphenationZone",
"doNotHyphenateCaps",
"showEnvelope",
"summaryLength",
"clickAndTypeStyle",
"defaultTableStyle",
"evenAndOddHeaders",
"bookFoldRevPrinting",
"bookFoldPrinting",
"bookFoldPrintingSheets",
"drawingGridHorizontalSpacing",
"drawingGridVerticalSpacing",
"displayHorizontalDrawingGridEvery",
"displayVerticalDrawingGridEvery",
"doNotUseMarginsForDrawingGridOrigin",
"drawingGridHorizontalOrigin",
"drawingGridVerticalOrigin",
"doNotShadeFormData",
"noPunctuationKerning",
"characterSpacingControl",
"printTwoOnOne",
"strictFirstAndLastChars",
"noLineBreaksAfter",
"noLineBreaksBefore",
"savePreviewPicture",
"doNotValidateAgainstSchema",
"saveInvalidXml",
"ignoreMixedContent",
"alwaysShowPlaceholderText",
"doNotDemarcateInvalidXml",
"saveXmlDataOnly",
"useXSLTWhenSaving",
"saveThroughXslt",
"showXMLTags",
"alwaysMergeEmptyNamespace",
"updateFields",
"hdrShapeDefaults",
"footnotePr",
"endnotePr",
"compat",
"docVars",
"rsids",
"mathPr",
"uiCompat97To2003",
"attachedSchema",
"themeFontLang",
"clrSchemeMapping",
"doNotIncludeSubdocsInStats",
"doNotAutoCompressPictures",
"forceUpgrade",
"captions",
"readModeInkLockDown",
"smartTagType",
"schemaLibrary",
"shapeDefaults",
"doNotEmbedSmartTags",
"decimalSymbol",
"listSeparator",
"docId",
"discardImageEditingData",
"defaultImageDpi",
"conflictMode"};
}
A typical usage scenario with this class is as follows:
using (var doc = WordprocessingDocument.Open(#"c:\tmp\test.docx", true))
{
var documentProtection = new DocumentProtection()
{
Formatting = DocumentFormat.OpenXml.OnOffValue.FromBoolean(true)
};
DocumentSettingsPart settings = doc.MainDocumentPart.DocumentSettingsPart;
// instantiate our ExtendedSettings class based on the
// original Settings
var extset = new SettingsExt(settings.Settings.OuterXml);
// new or existing?
if (extset.DocumentProtection == null)
{
extset.DocumentProtection = documentProtection;
}
else
{
// replace existing values
}
// this is key to make sure our own DOMTree is saved!
// don't forget this
settings.Settings = extset;
}

File.ReadLines taking long time to process textfile

I have a text file contains following similar lines for example 500k lines.
ADD GTRX:TRXID=0, TRXNAME="M_RAK_JeerExch_G_1879_18791_A-0", FREQ=81, TRXNO=0, CELLID=639, IDTYPE=BYID, ISMAINBCCH=YES, ISTMPTRX=NO, GTRXGROUPID=2556;
ADD GTRX:TRXID=1, TRXNAME="M_RAK_JeerExch_G_1879_18791_A-1", FREQ=24, TRXNO=1, CELLID=639, IDTYPE=BYID, ISMAINBCCH=NO, ISTMPTRX=NO, GTRXGROUPID=2556;
ADD GTRX:TRXID=5, TRXNAME="M_RAK_JeerExch_G_1879_18791_A-2", FREQ=28, TRXNO=2, CELLID=639, IDTYPE=BYID, ISMAINBCCH=NO, ISTMPTRX=NO, GTRXGROUPID=2556;
ADD GTRX:TRXID=6, TRXNAME="M_RAK_JeerExch_G_1879_18791_A-3", FREQ=67, TRXNO=3, CELLID=639, IDTYPE=BYID, ISMAINBCCH=NO, ISTMPTRX=NO, GTRXGROUPID=2556;
My intention is first to get value for FREQ where ISMAINBCCH=YES that I did easily, but if ISMAINBCCH=NO then concatenate FREQ values for that I have done by using File.ReadLines but it is taking a long time. Is there any better way to do this? If I take FREQ value for ISMAINBCCH=YES then concatenate the values ISMAINBCCH=NO are coming in a range of 10 lines above and below, but I don't know how to implement it. Probably I should get current line where ISMAINBCCH=YES for FREQ. Following is the code what I have done so far
using (StreamReader sr = File.OpenText(filename))
{
while ((s = sr.ReadLine()) != null)
{
if (s.Contains("ADD GTRX:"))
{
try
{
var gtrx = new Gtrx
{
CellId = int.Parse(PullValue(s, "CELLID")),
Freq = int.Parse(PullValue(s, "FREQ")),
//TrxNo = int.Parse(PullValue(s, "TRXNO")),
IsMainBcch = PullValue(s, "ISMAINBCCH").ToUpper() == "YES",
Commabcch = new List<string> { PullValue(s, "ISMAINBCCH") },
DEFINED_TCH_FRQ = null,
TrxName = PullValue(s, "TRXNAME"),
};
var result = String.Join(",",
from ss in File.ReadLines(filename)
where ss.Contains("ADD GTRX:")
where int.Parse(PullValue(ss, "CELLID")) == gtrx.CellId
where PullValue(ss, "ISMAINBCCH").ToUpper() != "YES"
select int.Parse(PullValue(ss, "FREQ")));
}
}
}
gtrx.DEFINED_TCH_FRQ = result;
}
from ss in File.ReadLines(filename)
This reads the entire file, produces an array, which you are then using in a loop (itself from reading the same file) so that array gets thrown away and then created again. You're reading the same file number_of_lines + 1 times when it hasn't changed in the meantime.
An obvious boost would therefore be to just call File.ReadLines(filename) once, store the array and then use that array both for the loop instead of while ((s = sr.ReadLine()) != null) and in the loop instead of that repeated call to ReadLines().
But there's a flaw in your logic in even looking at ReadLines() repeatedly; you're already scanning through the file so you're going to come across all the lines relevant to the same CELLID later anyway:
var gtrxDict = new Dictionary<int, Gtrx>();
using (StreamReader sr = File.OpenText(filename))
{
while ((s = sr.ReadLine()) != null)
{
if (s.Contains("ADD GTRX:"))
{
int cellID = int.Parse(PullValue(s, "CELLID"));
Gtrx gtrx;
if(gtrxDict.TryGetValue(cellID, out gtrx)) // Found previous one
gtrx.DEFINED_TCH_FRQ += "," + int.Parse(PullValue(ss, "FREQ"));
else // First one for this ID, so create a new object
gtrxDict[cellID] = new Gtrx
{
CellId = cellID,
Freq = int.Parse(PullValue(s, "FREQ")),
IsMainBcch = PullValue(s, "ISMAINBCCH").ToUpper() == "YES",
Commabcch = new List<string> { PullValue(s, "ISMAINBCCH") },
DEFINED_TCH_FRQ = int.Parse(PullValue(ss, "FREQ")).ToString(),
TrxName = PullValue(s, "TRXNAME"),
};
}
}
}
This way we don't need to keep more than one line from the file in memory at all, never mind doing so repeatedly. After this has run gtrxDict will contain a Gtrx object for each distinct CELLID in the file, with DEFINED_TCH_FRQ as a comma-separated list of the values from each matching line.
The following code snippet can be used to read the entire text file:
using System.IO;
/// Read Text Document specified by full path
private string ReadTextDocument(string TextFilePath)
{
string _text = String.Empty;
try
{
// open file if exists
if (File.Exists(TextFilePath))
{
using (StreamReader reader = new StreamReader(TextFilePath))
{
_text = reader.ReadToEnd();
reader.Close();
}
}
else
{
throw new FileNotFoundException();
}
return _text;
}
catch { throw; }
}
Get in-memory string, then apply Split() function to create string[] and process array elements in the same way as lines in original text file. In case of processing the very large file this method provides the option of reading it by chunks of data, processing them and then disposing upon completion (re: https://msdn.microsoft.com/en-us/library/system.io.streamreader%28v=vs.110%29.aspx).
As mentioned in comments by #Michael Liu, there is another option of using File.ReadAllText() which provides even more compact solution and can be used instead of reader.ReadToEnd(). Other useful methods of File class are detailed in : https://msdn.microsoft.com/en-us/library/system.io.file%28v=vs.110%29.aspx
And, finally, FileStream class can be used for both file read/write operations with various levels of granularity (re: https://msdn.microsoft.com/en-us/library/system.io.filestream%28v=vs.110%29.aspx).
SUMMARY
In response to the interesting comments thread, here is a brief summary.
The biggest bottleneck pertinent to the procedure described in PO question is Disk IO operations. Here are some numbers: average seek time in good quality HDD is about 5 msec plus the actual read time (per line). It well could be that the entire in-memory file data processing take less time than just a single HDD IO read (sometimes significantly; btw, SSD works better but still not a match to DDR3 RAM). The RAM memory size of modern PC is rather significant (typically 4...8 GB RAM is more than enough to handle most of text files). Thus, the core idea of my solution is to minimize the Disk IO read operations and do entire file data processing in-memory. Implementation can be different, apparently.
Hope this may help. Best regards,
I think that this more-or-less gets you what you want.
First read in all the data:
var data =
(
from s in File.ReadLines(filename)
where s != null
where s.Contains("ADD GTRX:")
select new Gtrx
{
CellId = int.Parse(PullValue(s, "CELLID")),
Freq = int.Parse(PullValue(s, "FREQ")),
//TrxNo = int.Parse(PullValue(s, "TRXNO")),
IsMainBcch = PullValue(s, "ISMAINBCCH").ToUpper() == "YES",
Commabcch = new List<string> { PullValue(s, "ISMAINBCCH") },
DEFINED_TCH_FRQ = null,
TrxName = PullValue(s, "TRXNAME"),
}
).ToArray();
Based on the loaded data create a lookup to return the frequencies based on each cell id:
var lookup =
data
.Where(d => !d.IsMainBcch)
.ToLookup(d => d.CellId, d => d.Freq);
Now update the DEFINED_TCH_FRQ based on the lookup:
foreach (var d in data)
{
d.DEFINED_TCH_FRQ = String.Join(",", lookup[d.CellId]);
}

C# Reading certain element / attribute from xml file

Im trying to read a certain attributes from following xml file (as console program)
http://api.openweathermap.org/data/2.5/forecast?q=lahti,fin&mode=xml
As you see inside 'forecast' element there are multiple 'time' elements. What I want is to pick certain 'time' element and then pick given thing inside of it (lets say 'symbol') and print all/any attributes it has.
I want to be able to control which 'time' element I pick and which attributes I want to print.
This far all I have managed to do is to print every 'time' element and their attributes and also I managed to print every attribute inside of given 'time' element. But I just can't figure how to control it.
With the following code, I can print everything inside the first 'time' element. Item(0) is the index of element and the for loop makes sure that I don't get empty lines. As you can see from xml file, some 'time' elements has different amount of attributes inside of them so I guess I need to call them By name insted of index.
static void xmlReader()
{
int i;
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(parseLink());
foreach (XmlNode xmlNode in xmlDoc.DocumentElement.GetElementsByTagName("time").Item(0))
for (i = 0; i < xmlNode.Attributes.Count; i++)
{
Console.WriteLine(xmlNode.Attributes[i].Value);
}
Console.ReadKey();
}
Use Linq2Xml, it's much easier and convenient.
public static void Main()
{
var forecast = XDocument.Load(#"http://api.openweathermap.org/data/2.5/forecast?q=lahti,fin&mode=xml")
.Root
.Element("forecast");
foreach (var time in forecast.Elements("time")
.Where(e => e.Element("clouds")
.Attribute("value")
.Value == "overcast clouds"))
{
Console.WriteLine(time.Element("symbol").Attribute("name").Value);
}
}
Using XDocument is a bit easier here...
private static void XmlOutput()
{
var filePathAndName = #"http://api.openweathermap.org/data/2.5/forecast?q=lahti,fin&mode=xml";
var xmlDoc = XDocument.Load(filePathAndName);
// Get list of Nodes matching the "time" name or any other filters you wish.
var nodes = xmlDoc.Descendants().Where(nd => nd.Name.LocalName == "time");
// Filter the node list to only those where the date is as specified (and any other filters you wish).
// This could be done in the initial linq query - just making it clearer here.
nodes = nodes.Where(nd => nd.Attributes().Any(cnd => cnd.Name.LocalName == "from" && cnd.Value.Contains("2015-03-07")));
foreach (XElement element in nodes)
{
// For each descendant node where named "symbol"...
foreach(var node in element.Descendants().Where(nd => nd.Name.LocalName == "symbol"))
{
// Write out these particular attributes ("number" and "name")
string output = "";
output += node.Attributes().FirstOrDefault(nd => nd.Name.LocalName == "number").Value;
output += ", " + node.Attributes().FirstOrDefault(nd => nd.Name.LocalName == "name").Value;
Console.WriteLine(output);
}
}
Console.ReadKey();
}
Similar #aush's response, but with some formatting
var doc = XDocument.Load(#"http://api.openweathermap.org/data/2.5/forecast?q=lahti,fin&mode=xml");
var forecastEl = doc.Root.Element(XName.Get("forecast"));
var timeNodes = from e in forecastEl.Elements("time")
where e.Element("symbol")
.Attribute(XName.Get("name"))
.Value == "light snow"
select e;
var colFormat = "{0,-20} {1,-20} {2,-30}";
Console.WriteLine(colFormat, "TimeFrom", "TimeTo", "SymbolName");
foreach(var time in timeNodes)
Console.WriteLine(colFormat
, time.Attribute("from").Value
, time.Attribute("to").Value
, time.Element("symbol").Attribute("name").Value);
Results:
TimeFrom TimeTo SymbolName
2015-03-07T12:00:00 2015-03-07T15:00:00 light snow
2015-03-07T15:00:00 2015-03-07T18:00:00 light snow
You can use XmlReader (tutorial) as flowing,
Here I get from attribute from time element and name attribute sybol element.. And they are for same Time element. Also added extracting example for value of cloud
using (XmlReader reader = XmlReader.Create(#"http://api.openweathermap.org/data/2.5/forecast?q=lahti,fin&mode=xml"))
{
// Walk through all Elements
while (reader.Read())
{
// If we meet time element ; go inside and walk
if (reader.Name == "time")
{
Console.WriteLine("A new TIME ");
// Extract from attribute
String from = reader["from"];
if (from != null)
{
Console.WriteLine("\t From : " + from);
}
// Now walk through all elements inside same Time element
// Here I use do-while ; what we check is End element of time : </time> .. when we walk till we meet </time> we are inside children of same Time
// That mean we start from <time> and walk till we meet </time>
do
{
reader.Read();
if (reader.Name == "symbol")
{
// You can use this approach for any Attribute in symbol Element
String name = reader["name"];
if (name != null)
{
Console.WriteLine("\t Symbol name :" + name);
}
}
if (reader.Name == "clouds")
{
String clouds = reader["value"];
if (clouds != null)
{
Console.WriteLine("\t\t Clouds value : " + clouds);
}
}
} while (reader.NodeType != XmlNodeType.EndElement && reader.Name != "time");
}
}
}
Simply try this on your Console program..
Using my xml library here, you can search for an actual DateTime value like this:
XElement root = XElement.Load(#"http://api.openweathermap.org/data/2.5/forecast?q=lahti,fin&mode=xml");
var search = DateTime.Parse("2015-03-07T21:00:00");
XElement time = root.XPathElement("//time[#from >= {0} and #to > {1}]", search, search);
var value = time.ToString();

wrong value in result query so Null Reference Exception in Linq query

I have two xml files which I am comparing with each other. The Linq query result1 is throwing Null Reference Exception after executing correctly for one rule. And when I debugged I found the section is displaying wrong values. I am unable to figure out the cause.
Rules.xml file:
<rule id="1" numberofsections="2">
<section id="1" attributeid="1686" ruleoperator="==" condition="and">
<name>Processor type</name>
<value>Core i3</value>
</section>
<section id="2" attributeid="1438" ruleoperator="<" condition="and" >
<name>Weight</name>
<value>3.8 LBS</value>
</section>
<type>ultrabook</type>
</rule>
And the code snippet:
XDocument rulesXml = XDocument.Load("/RulesEnginescope/RulesEnginescope/rulesSubType.xml");
XDocument productXml = XDocument.Load("c:/RuleEngine/RuleEngine/product.xml");
var getSelectedLeafCategoryRules = from rules2 in rulesXml.Descendants("QueryTransformation").Descendants("leafcategory")
where ((long)System.Convert.ToDouble(rules2.FirstAttribute.Value) == 4590)
select rules2;
var rules = getSelectedLeafCategoryRules.Descendants("rule");
var productAttribute = productXml.Descendants("AttrList").Descendants("Attr");
foreach (var x in rules)
{
var section = x.Elements("section");
/*Wrong value in section.count()*/
Console.WriteLine(section.Count());
var result1 = from p in section
from pa in productAttribute
where (p.Attribute("attributeid").Value == pa.Attribute("id").Value
&& p.Element("name").Value == pa.Element("Name").Value)
select new
{
ruleAttribute = new
{
ruleId = p.Attribute("attributeid").Value,
ruleOperator = p.Attribute("ruleoperator").Value,
name = p.Element("name").Value,
value = p.Element("value").Value,
condition = p.Attribute("condition").Value
},
prodAttribute = new
{
productId = pa.Attribute("id").Value,
name = pa.Element("Name").Value,
value = pa.Element("ValueList").Element("Value").Value
/*Error*/ }
};
if (result1.Count() != 0 && result1.Count() == System.Convert.ToInt64(x.Attribute("numberofsections").Value))
{
//checking each section
foreach (var r in result1)
{
...
}
}
The idiomatic way to get the value of elements and attributes in LINQ-to-XML is to cast the element or attribute to the type you want, rather than accessing the Value attribute.
prodAttribute = new
{
productId = (string)pa.Attribute("id"),
name = (string)pa.Element("Name"),
// ...
}
Using this pattern avoids null ref exceptions caused when calls to Attribute() and Element() don't find a matching node. It also reduces verbosity:
((long)System.Convert.ToDouble(rules2.FirstAttribute.Value)
// should be
(long)rules2.FirstAttribute
You'll still need to add null checks when you're accessing children of children. This can get verbose; one way to keep it succinct is to use IEnumerable-oriented methods so that you're operating on a (possibly empty) collection, rather than a (possibly null) instance.
pa.Element("ValueList").Element("Value").Value
// could be
(string)pa.Elements("ValueList").Elements("Value").FirstOrDefault ()
Finally, note that capitalization matters in LINQ-to-XML. In your code you seem to be switching capitalization patterns ("id" vs. "Name") often; it's likely that your source XML is more consistent.

XML in C# - hasAttribute, getAttribute is not there? Why?

I'm working on a project that requires me to build a game's rooms, items and NPC in a separate database. I've chosen XML, but something prevents me from properly parsing the XML in my C# code. What am I doing wrong?
My errors are these:
System.xml.xmlnode does not contain a definition for HasAttribute
(this goes for GetAttribute as well) and no extension method accepting 'HasAttribute' accepting a first argument of type System.Xml.XmlNode ?
This also goes for GetParentNode, and my very last line
string isMoveableStr = xmlRoom.GetAttribute("isMoveable");
somehow goes:
the name xmlRoom does not exist in the current context
Here's the method:
public void loadFromFile()
{
XmlDocument xmlDoc = new XmlDocument(); // create an xml document object in memory.
xmlDoc.Load("gamedata.xml"); // load the XML document from the specified file into the object in memory.
// Get rooms, NPCs, and items.
XmlNodeList xmlRooms = xmlDoc.GetElementsByTagName("room");
XmlNodeList xmlNPCs = xmlDoc.GetElementsByTagName("npc");
XmlNodeList xmlItems = xmlDoc.GetElementsByTagName("item");
foreach(XmlNode xmlRoom in xmlRooms) { // defaults for room:
string roomID = "";
string roomDescription = "this a standard room, nothing special about it.";
if( !xmlRoom.HasAttribute("ID") ) //http://msdn.microsoft.com/en-us/library/acwfyhc7.aspx
{
Console.WriteLine("A room was in the xml file without an ID attribute. Correct this to use the room");
continue; //skips remaining code in loop
} else {
roomID = xmlRoom.GetAttribute("id"); //http://msdn.microsoft.com/en-us/library/acwfyhc7.aspx
}
if( xmlRoom.hasAttribute("description") )
{
roomDescription = xmlRoom.GetAttribute("description");
}
Room myRoom = new Room(roomDescription, roomID); //creates a room
rooms.Add(myRoom); //adds to list with all rooms in game ;)
} foreach(XmlNode xmlNPC in xmlNPCs)
{ bool isMoveable = false;
if( !xmlNPC.hasAttribute("id") )
{
Console.WriteLine("A NPC was in the xml file, without an id attribute, correct this to spawn the npc");
continue; //skips remaining code in loop
}
XmlNode inRoom = xmlNPC.getParentNode();
string roomID = inRoom.GetAttribute("id");
if( xmlNPC.hasAttribute("isMoveable") )
{
string isMoveableStr = xmlRoom.GetAttribute("isMoveable");
if( isMoveableStr == "true" )
isMoveable = true;
}
}
}
System.Xml.XmlElement has the function you are looking for. You are getting XMLNode's. You will need to cast the nodes to XmlElement to get that function.
xmlElement = (System.Xml.XmlElement)xmlRoom;
This is not specifically germane to your question, but a response to #ChaosPandion's suggestion and your question in the comments, here is your code example using Linq to XML:
var xdoc = XDocument.Load("gamedata.xml");
var xRooms = xdoc.Descendants("room");
List<Room> rooms;
//If an element doesn't have a given attribute, the Attribute method will return null for that attribute
//Here we first check if any rooms are missing the ID attribute
if (xRooms.Any( xRoom => (string)xRoom.Attribute("ID") == null )) {
Console.WriteLine("A room was in the xml file without an ID attribute...");
} else {
rooms = (
from xRoom in xRooms
select new Room(
xRoom.Attribute("description") ?? "this a standard room, nothing special about it.",
(int)xRoom.Attribute("ID")
)
).ToList();
}
var xNPCs = xdoc.Descendants("npc");
if (xNPCs.Any( xNPC => (string)xNPC.Attribute("id") == null )) {
Console.WriteLine("A NPC was in the xml file, without an id attribute, correct this to spawn the npc");
} else {
var npcs = (
from xNPC in xNPCs
let inRoom = xNPC.Parent
select new {
xNPC,
inRoom,
isMoveable = (string)xNPC.Attribute("isMoveable") != null &&
(string)inRoom.Attribute("isMoveable") == true
}
).ToList();
}
Then you can use a simple foreach on the npcs collection:
foreach (var npc in npcs) {
Console.WriteLine(inRoom.Attribute("ID"));
Console.WriteLine(npc.IsMoveable);
}
OTOH since this code makes use of the Descendants method, which returns an collection of XElement (the type corresponding to an XML element) and not of XNode (the type corresponding to an XML node), the whole issue of a node object not having attributes is neatly sidestepped.
XmlNode does not have methods HasAttribute or GetAttribute. If you look at the MSDN entry for XmlNode, you can see the methods it has available.
http://msdn.microsoft.com/en-us/library/system.xml.xmlnode.aspx
If you use XmlNode.Attributes["ATTRIBUTE_NAME"] or in your case xmlRoom.Attributes["ID"], you should be able to find the attribute you're looking for. That is, if you would like to continue using XmlNodes.
The following link has an example of how to retrieve attributes by name from an XmlNode:
http://msdn.microsoft.com/en-us/library/1b823yx9.aspx

Categories