Iteration through XML? - c#

I have a 6GB XML file and I'm using XmlReader to loop through the file. The file's huge but there's nothing I can do about that. I use LINQ, but the size doesn't let me use XDocument as I get an OutOfMemory error.
I'm using XmlReader to loop through the whole file and extract what I need. I'm including a sample XML file.
Essentially, this is what I do:
Find tag Container. If found, then retrieve attribute "ID".
If "ID" begins with LOCAL, then this is what I'll be reading.
Reader loop until I find tag Family with value CELL_FD
When found, loop the reader.read() until I find tag IMPORTANT_VALUE.
Once found, read value of IMPORTANT_VALUE.
I'm done with this container, so continue looping until I find the next Container (that's where the break comes in).
This is the simplified version of how I've been reading the file and finding the relevant values.
while (myReader.Read())
{
if ((myReader.Name == "CONTAINER"))
{
if (myReader.HasAttributes)
{
string Attribute = myReader.GetAttribute("id");
if (Attribute.IndexOf("LOCAL_") >= 0)
{
while (myReader.Read())
{
if (myReader.Name == "FAMILY")
{
myReader.Read();//read value
string Family = myReader.Value;
if (Family == "CELL_FDD")
{
while (myReader.Read())
{
if ((myReader.Name == "IMPORTANT_VALUE"))
{
myReader.Read();
string Counter = myReader.Value;
Console.WriteLine(Attribute + " (found: " + Counter + ")");
break;
}
}
}
}
}
}
}
}
}
And this is the XML:
<es:esFD xmlns:es="File.xsd">
<vs:vsFD xmlns:vs="OTHER_FILE.xsd">
<CONTAINER id="LOCAL_CONTAINER1">
<ATTRIBUTES>
<FAMILY>CELL_FDD</FAMILY>
<CELL_FDD>
<VAL1>1.1.2.3</VAL1>
<VAL2>JSMITH</VAL2>
<VAL3>320</VAL3>
<IMPORTANT_VALUE>VERY</IMPORTANT_VALUE>
<VAL4>320</VAL4>
</CELL_FDD>
<FAMILY>BLAH</FAMILY>
<BLAH>
<VAL1>1.4.43.3</VAL1>
<VAL2>NA</VAL2>
<VAL3>349</VAL3>
<IMPORTANT_VALUE>NA</IMPORTANT_VALUE>
<VAL4>43</VAL4>
<VAL5>00</VAL5>
<VAL6>12</VAL6>
</BLAH>
</ATTRIBUTES>
</CONTAINER>
<CONTAINER id="FOREIGN_ELEMENT1">
<ATTRIBUTES>
<FAMILY>CELL_FDD</FAMILY>
<CELL_FDD>
<VAL1>1.1.2.3</VAL1>
<VAL2>JSMITH</VAL2>
<VAL3>320</VAL3>
<IMPORTANT_VALUE>VERY</IMPORTANT_VALUE>
<VAL4>320</VAL4>
</CELL_FDD>
<FAMILY>BLAH</FAMILY>
<BLAH>
<VAL1>1.4.43.3</VAL1>
<VAL2>NA</VAL2>
<VAL3>349</VAL3>
<IMPORTANT_VALUE>NA</IMPORTANT_VALUE>
<VAL4>43</VAL4>
<VAL5>00</VAL5>
<VAL6>12</VAL6>
</BLAH>
</ATTRIBUTES>
</CONTAINER>
</vs:vsFD>
</es:esFD>
How can I break from the most inner loop so that I can reach the top-most loop?

Using separate methods should make it easier to control your loops:
while (myReader.Read())
{
if ((myReader.Name == "CONTAINER"))
{
ProcessContainerElement(myReader);
}
}
In the ProcessContainerElement method, you can return when you determine that you need to start looking for the next CONTAINER element.
private void ProcessContainerElement(XmlReader myReader)
{
while (whatever)
{
if ((myReader.Name == "IMPORTANT_VALUE"))
{
myReader.Read();
string Counter = myReader.Value;
Console.WriteLine(Attribute + " (found: " + Counter + ")");
return;
}
}
}

You can read with XmlReader and each node put to XmlDocument.
Something like this, not tested:
bool notFound = false;
notFound |= !reader.ReadToDescendant("root");
notFound |= !reader.ReadToDescendant("CONTAINER");
if (notFound)
Throw new Exception("[Не удаётся найти \"/root/CONTAINER\"]");
do
{
XmlDocument doc = new XmlDocument();
doc.LoadXml(reader.ReadOuterXml());
XmlNode container = doc.DocumentElement;
// do your work with container
}
while (reader.ReadToNextSibling("CONTAINER"));
reader.Close();

Using svick's comment, I ended up combining LINQ to XML. Once I reached the correct element and checked that the attribute had the correct ID, I dumped it to XElement.Load.

Related

StackOverflow in SelectSingleNode

Hello I have function which creates/updates fields in app.exe.config file
public static void UpdateConfig(string FieldName, string FieldValue, ConfigSelector SectionName = ConfigSelector.AppSettings)
{
switch (SectionName)
{
case ConfigSelector.Execption:
{
// MessageBox.Show("gg");
var xmlDoc = new XmlDocument();
xmlDoc.Load(AppDomain.CurrentDomain.SetupInformation.ConfigurationFile);
if (xmlDoc.SelectSingleNode("configuration/Execption") != null)
{
if (xmlDoc.SelectSingleNode("configuration/Execption/List") != null)
{
// create new node <add key="Region" value="Canterbury" />
var nodeRegion = xmlDoc.CreateElement("add");
nodeRegion.SetAttribute("key", FieldName);
nodeRegion.SetAttribute("value", FieldValue);
xmlDoc.SelectSingleNode("configuration/Execption/List").AppendChild(nodeRegion);
}
else
{
var List = xmlDoc.CreateElement("List");
xmlDoc.SelectSingleNode("configuration/Execption").AppendChild(List);
UpdateConfig(FieldName, FieldValue, SectionName);
}
}
else
{
var List = xmlDoc.CreateElement("Execption");
xmlDoc.SelectSingleNode("configuration").AppendChild(List);
UpdateConfig(FieldName, FieldValue, SectionName);
}
xmlDoc.Save(AppDomain.CurrentDomain.SetupInformation.ConfigurationFile);
ConfigurationManager.RefreshSection("Execption/List");
break;
}
}
}
Function works first Check if xpath configuration/Execption exist, if not exist it creates this path and recalls function again, second time check if configuration/Execption/List path exist if not creates path and recalls function again, and third time adds required fields which is fieldname and fieldvalue,
but I getting System.StackOverflowException in line:
if (xmlDoc.SelectSingleNode("configuration/Execption") != null)
Did I miss something?
You are calling UpdateConfig recursively, with the exact same arguments already passed to it
UpdateConfig(FieldName, FieldValue, SectionName);
Since the recursive call happens before the xmlDoc.Save(), it always works on the same content.
Saving before doing the recursive call should fix the issue.
You don't save the document after adding the new element, so when you are loading the file in the next iteration the new element isn't there, and xmlDoc.SelectSingleNode("configuration/Execption") != null is still false, so the code creates the element again in infinite recursion and you get StackOverflowException.
Just save the document after you change it
else
{
var List = xmlDoc.CreateElement("Execption");
xmlDoc.SelectSingleNode("configuration").AppendChild(List);
xmlDoc.Save(AppDomain.CurrentDomain.SetupInformation.ConfigurationFile);
UpdateConfig(FieldName, FieldValue, SectionName);
}

Determine if XmlTextReader.Read() reads an end tag

Using XmlTextReader.Read(), how do I determine if what the reader read is an end/closing tag?
Note that I'm not looking for the usage of XmlTextReader.IsEmptyElement. I'm looking to verify whether given the following XML:
<thistag what="nothing">
<inside color="cyan"/>
</thistag>
Can I determine if the thistag tag I read is an opening tag or a end/closing tag?
My solution so far involved checking for the presence of the what attribute:
if (reader.GetAttribute("what") == null)
{
// it's an end tag!
}else{
// it's a start tag!
}
But I understand that this approach isn't so elegant, and that it could fail if attributes aren't required for that specific tag.
I haven't tried this, hope this helps:
XmlTextReader reader = null;
reader = new XmlTextReader(filename);
while (reader.Read())
{
if(reader.NodeType==XmlNodeType.Element) // for opening tag
{
//your code
}
else if(reader.NodeType==XmlNodeType.EndElement) // for closing tag
{
//your code
}
}
XmlNodeType.Element corresponds to opening nodes. XmlNodeType.EndElement is for closing nodes.

How to get id and state attibutes in LINQ to XML

I'm trying to learn C# (and Linq-to-Xml) for handling of XML files, yet having some troubles.
I can get the elements and values, but my output is missing the information I need for logic decisions (id and state attributes in the transaction and target elements).
I think it has to do with the Descendants, but not sure how to grab them.
A little nudge in the right direction?
My XML File
<?xml version="1.0"?>
<xliff version="1.2">
<file source-language="en-US" datatype="plaintext" category="framework">
<body>
<transaction approved="no" id="1">
<source xml:lang="en-US">Product Family</source>
<target state="translated" xml:lang="en-US">Product Family</target>
</transaction>
</body>
</file>
</xliff>
C# Code
private void btnOpen_Click(object sender, EventArgs e)
{
// Show the dialog, using defaults, and get result
OpenFileDialog ofdResult = new OpenFileDialog();
if (ofdResult.ShowDialog() == DialogResult.OK)
{
try
{
if (ofdResult.OpenFile() != null)
{
XDocument xmlFile = XDocument.Load(ofdResult.FileName);
// print elements recursively
PrintElement(xmlFile.Root);
}
}
catch (Exception ex)
{
MessageBox.Show("Error: Could not read file from disk." + ex.Message);
}
}
}
// display an element (and its children, if any) in the TextBox
private void PrintElement( XElement element )
{
// get element name without namespace
string name = element.Name.LocalName;
// display the element's name within its tag
tbxOutput.AppendText( '<' + name + ">\n" );
// check for child elements and print value if none contained
if ( element.HasElements )
{
// print all child elements at the next indentation level
foreach ( var child in element.Elements() )
// Display all attributes
PrintElement(child);
} // end if
else
{
// display the text inside this element
tbxOutput.AppendText( element.Value.Trim() + '\n' );
} // end else
// display end tag
tbxOutput.AppendText( "</" + name + ">\n" );
} // end method PrintElement
My Output...
<xliff>
<file>
<body>
<trans-unit>
<source>
Product Family
</source>
<target>
Product Family
</target>
</trans-unit>
</body>
</file>
</xliff>
If you need attributes why don't you read attributes? XElement has a boolean property HasAttributes and you can loop over all attributes in the same way like you do it for Elements(). Use Attributes() method for that. Please refer to the MSDN article 1
You can write a method like
string GetAttributesString(XElement element)
{
if(element == null || !element.HasAttributes)
return string.Empty;
string format = " {0}={1}";
StringBuider result = new StringBuilder();
foreach(var attribute in element.Attributes())
{
result.AppendFormat(format, attribute.Name, attribute.Value);
}
return result.ToString();
}
And use it in your PrintElement method
tbxOutput.AppendText("<" + name + GetAttributesString(element) + ">\n");

C# .NET XMLWriter/Reader problem

I've been having problems writing XML and reading it in. I have a handwritten XML that gets read in fine, but after I write the XML it acts funny.
The output of the WriteXML: http://www.craigmouser.com/random/test.xml
It works if you hit enter after the (specials) tag. I.E. make (specials)(special) look like
(specials)
(special)
If I step through it, when reading it, it goes to the start node of specials, then the next iteration reads it as an EndElement with name Shots. I have no idea where to go from here. Thanks in advance.
Code: Writing
public void SaveXMLFile(string filename, Bar b, Boolean saveOldData)
{
XmlWriter xml;
if(filename.Contains(".xml"))
{
xml = XmlWriter.Create(filename);
}
else
{
xml = XmlWriter.Create(filename + ".xml");
}
xml.WriteStartElement("AggievilleBar");
xml.WriteElementString("name", b.Name);
xml.WriteStartElement("picture");
xml.WriteAttributeString("version", b.PictureVersion.ToString());
xml.WriteEndElement();
xml.WriteElementString("location", b.Location.Replace(Environment.NewLine, "\n"));
xml.WriteElementString("news", b.News.Replace(Environment.NewLine, "\n"));
xml.WriteElementString("description", b.Description.Replace(Environment.NewLine, "\n"));
xml.WriteStartElement("specials");
xml.WriteString("\n"); //This line fixes the problem... ?!?!
foreach (Special s in b.Specials)
{
if (s.DayOfWeek > 0 || (s.DayOfWeek == -1
&& ((s.Date.CompareTo(DateTime.Today) < 0 && saveOldData )
|| s.Date.CompareTo(DateTime.Today) >= 0)))
{
xml.WriteStartElement("special");
xml.WriteAttributeString("dayofweek", s.DayOfWeek.ToString());
if (s.DayOfWeek == -1)
xml.WriteAttributeString("date", s.Date.ToString("yyyy-MM-dd"));
xml.WriteAttributeString("price", s.Price.ToString());
xml.WriteString(s.Name);
xml.WriteEndElement();
}
}
xml.WriteEndElement();
xml.WriteEndElement();
xml.Close();
}
Code: Reading
public Bar LoadXMLFile(string filename)
{
List<Special> specials = new List<Special>();
XmlReader xml;
try
{
xml = XmlReader.Create(filename);
}
catch (Exception)
{
MessageBox.Show("Unable to open file. If you get this error upon opening the program, we failed to pull down your current data. You will most likely be unable to save, but you are free to try. If this problem persists please contact us at pulsarproductionssupport#gmail.com",
"Error Opening File", MessageBoxButtons.OK, MessageBoxIcon.Error);
return null;
}
Bar current = new Bar();
Special s = new Special();
while (xml.Read())
{
if (xml.IsStartElement())
{
switch (xml.Name)
{
case "AggievilleBar":
current = new Bar();
break;
case "name":
if (xml.Read())
current.Name = xml.Value.Trim();
break;
case "picture":
if (xml.HasAttributes)
{
try
{
current.PictureVersion = Int32.Parse(xml.GetAttribute("version"));
}
catch (Exception)
{
MessageBox.Show("Error reading in the Picture Version Number.","Error",MessageBoxButtons.OK,MessageBoxIcon.Error);
}
}
break;
case "location":
if (xml.Read())
current.Location = xml.Value.Trim();
break;
case "news":
if (xml.Read())
current.News = xml.Value.Trim();
break;
case "description":
if (xml.Read())
current.Description = xml.Value.Trim();
break;
case "specials":
if (xml.Read())
specials = new List<Special>();
break;
case "special":
s = new Special();
if (xml.HasAttributes)
{
try
{
s.DayOfWeek = Int32.Parse(xml.GetAttribute(0));
if (s.DayOfWeek == -1)
{
s.Date = DateTime.Parse(xml.GetAttribute(1));
s.Price = Int32.Parse(xml.GetAttribute(2));
}
else
s.Price = Int32.Parse(xml.GetAttribute(1));
}
catch (Exception)
{
MessageBox.Show("Error reading in a special.", "Error", MessageBoxButtons.OK, MessageBoxIcon.Error);
}
}
if (xml.Read())
s.Name = xml.Value.Trim();
break;
}
}
else
{
switch (xml.Name)
{
case "AggievilleBar":
xml.Close();
break;
case "special":
specials.Add(s);
break;
case "specials":
current.Specials = specials;
break;
}
}
}
return current;
}
Without seeing your code it's hard to really give a straight answer to that question. However, I can suggest using Linq-to-XML instead of XMLReader/XMLWriter -- it's so much easier to work with when you don't have to read each node one at a time and determine what node you're working with, which sounds like the problem you're having.
For example, code like:
using (var reader = new XmlReader(...))
{
while reader.Read()
{
if (reader.Name = "book" && reader.IsStartElement)
{
// endless, confusing nesting!!!
}
}
}
Becomes:
var elem = doc.Descendants("book").Descendants("title")
.Where(c => c.Attribute("name").Value == "C# Basics")
.FirstOrDefault();
For an introduction to LINQ-to-XML, check out http://www.c-sharpcorner.com/UploadFile/shakthee/2868/, or just search for "Linq-to-XML". Plenty of examples out there.
Edit: I tried your code and I was able to reproduce your problem. It seems that without a newline before the special tag, the first special element is read in as IsStartElement() == false. I wasn't sure why this is; even skimmed through the XML Specifications and didn't see any requirements about newlines before elements.
I rewrote your code in Linq-to-XML and it worked fine without any newlines:
var xdoc = XDocument.Load(filename);
var barElement = xdoc.Element("AggievilleBar");
var specialElements = barElement.Descendants("special").ToList();
var specials = new List<Special>();
specialElements.ForEach(s =>
{
var dayOfWeek = Convert.ToInt32(s.Attribute("dayofweek").Value);
var price = Convert.ToInt32(s.Attribute("price").Value);
var date = s.Attribute("date");
specials.Add(new Special
{
Name = s.Value,
DayOfWeek = dayOfWeek,
Price = price,
Date = date != null ? DateTime.Parse(date.Value) : DateTime.MinValue
});
});
var bar = new Bar() {
Name = barElement.Element("name").Value,
PictureVersion = Convert.ToInt32(barElement.Elements("picture").Single()
.Attribute("version").Value),
Location = barElement.Element("location").Value,
Description = barElement.Element("description").Value,
News = barElement.Element("news").Value,
Specials = specials
};
return bar;
Would you consider using Linq-to-XML instead of XMLReader? I've had my share of trouble with XMLReader in the past and once I switched to Linq-to-XML haven't looked back!
EDIT: I know this question is rather old now, but I just came across an article that reminded me of this question and might explain why this is happening: --> http://www.codeproject.com/KB/dotnet/pitfalls_xml_4_0.aspx
The author states:
In this light, a nasty difference between XmlReaders/Writers and XDocument is the way whitespace is treated. (See http://msdn.microsoft.com/en-us/library/bb387014.aspx.)
From msdn:
In most cases, if the method takes LoadOptions as an argument, you can optionally preserve insignificant white space as text nodes in the XML tree. However, if the method is loading the XML from an XmlReader, then the XmlReader determines whether white space will be preserved or not. Setting PreserveWhitespace will have no effect.
So perhaps, since you're loading using an XmlReader, the XmlReader is making the determination as to whether or not it should preserve white space. Most likely it IS preserving the white space which is why the newline (or lack thereof) makes a difference. And it doesn't seem like you can do anything to change it, so long as you're using an XmlReader! Very peculiar.
I'd recommend you use the XmlDocument class and its Load and Save methods, and then work with the XML tree instead of messing around with XmlReader and XmlWriter. In my experience using XmlDocument has fewer weird formatting problems.

Can ConfigurationManager retain XML comments on Save()?

I've written a small utility that allows me to change a simple AppSetting for another application's App.config file, and then save the changes:
//save a backup copy first.
var cfg = ConfigurationManager.OpenExeConfiguration(pathToExeFile);
cfg.SaveAs(cfg.FilePath + "." + DateTime.Now.ToFileTime() + ".bak");
//reopen the original config again and update it.
cfg = ConfigurationManager.OpenExeConfiguration(pathToExeFile);
var setting = cfg.AppSettings.Settings[keyName];
setting.Value = newValue;
//save the changed configuration.
cfg.Save(ConfigurationSaveMode.Full);
This works well, except for one side effect. The newly saved .config file loses all the original XML comments, but only within the AppSettings area. Is it possible to to retain XML comments from the original configuration file AppSettings area?
Here's a pastebin of the full source if you'd like to quickly compile and run it.
I jumped into Reflector.Net and looked at the decompiled source for this class. The short answer is no, it will not retain the comments. The way Microsoft wrote the class is to generate an XML document from the properties on the configuration class. Since the comments don't show up in the configuration class, they don't make it back into the XML.
And what makes this worse is that Microsoft sealed all of these classes so you can't derive a new class and insert your own implementation. Your only option is to move the comments outside of the AppSettings section or use XmlDocument or XDocument classes to parse the config files instead.
Sorry. This is an edge case that Microsoft just didn't plan for.
Here is a sample function that you could use to save the comments. It allows you to edit one key/value pair at a time. I've also added some stuff to format the file nicely based on the way I commonly use the files (You could easily remove that if you want). I hope this might help someone else in the future.
public static bool setConfigValue(Configuration config, string key, string val, out string errorMsg) {
try {
errorMsg = null;
string filename = config.FilePath;
//Load the config file as an XDocument
XDocument document = XDocument.Load(filename, LoadOptions.PreserveWhitespace);
if(document.Root == null) {
errorMsg = "Document was null for XDocument load.";
return false;
}
XElement appSettings = document.Root.Element("appSettings");
if(appSettings == null) {
appSettings = new XElement("appSettings");
document.Root.Add(appSettings);
}
XElement appSetting = appSettings.Elements("add").FirstOrDefault(x => x.Attribute("key").Value == key);
if (appSetting == null) {
//Create the new appSetting
appSettings.Add(new XElement("add", new XAttribute("key", key), new XAttribute("value", val)));
}
else {
//Update the current appSetting
appSetting.Attribute("value").Value = val;
}
//Format the appSetting section
XNode lastElement = null;
foreach(var elm in appSettings.DescendantNodes()) {
if(elm.NodeType == System.Xml.XmlNodeType.Text) {
if(lastElement?.NodeType == System.Xml.XmlNodeType.Element && elm.NextNode?.NodeType == System.Xml.XmlNodeType.Comment) {
//Any time the last node was an element and the next is a comment add two new lines.
((XText)elm).Value = "\n\n\t\t";
}
else {
((XText)elm).Value = "\n\t\t";
}
}
lastElement = elm;
}
//Make sure the end tag for appSettings is on a new line.
var lastNode = appSettings.DescendantNodes().Last();
if (lastNode.NodeType == System.Xml.XmlNodeType.Text) {
((XText)lastNode).Value = "\n\t";
}
else {
appSettings.Add(new XText("\n\t"));
}
//Save the changes to the config file.
document.Save(filename, SaveOptions.DisableFormatting);
return true;
}
catch (Exception ex) {
errorMsg = "There was an exception while trying to update the config value for '" + key + "' with value '" + val + "' : " + ex.ToString();
return false;
}
}
If comments are critical, it might just be that your only option is to read & save the file manually (via XmlDocument or the new Linq-related API). If however those comments are not critical, I would either let them go or maybe consider embedding them as (albeit redundant) data elements.

Categories