XML Parsing - Read a Simple XML File and Retrieve Values

XML Parsing - Read a Simple XML File and Retrieve Values - c#

I've written a Task Scheduling program for learning purposes. Currently I'm saving the scheduled tasks just as plain text and then parsing it using Regex. This looks messy (code wise) and is not very coherent.
I would like to load the scheduled tasks from an XML file instead, I've searched quite a bit to find some solutions but I couldn't get it to work how I wanted.
I wrote an XML file structured like this to store my data in:
<Tasks>
<Task>
<Name>Shutdown</Name>
<Location>C:/WINDOWS/system32/shutdown.exe</Location>
<Arguments>-s -f -t 30</Arguments>
<RunWhen>
<Time>8:00:00 a.m.</Time>
<Date>18/03/2011</Date>
<Days>
<Monday>false</Monday>
<Tuesday>false</Tuesday>
<Wednesday>false</Wednesday>
<Thursday>false</Thursday>
<Friday>false</Friday>
<Saturday>false</Saturday>
<Sunday>false</Sunday>
<Everyday>true</Everyday>
<RunOnce>false</RunOnce>
</Days>
</RunWhen>
<Enabled>true</Enabled>
</Task>
</Tasks>
The way I'd like to parse the data is like so:
Open Tasks.xml
Load the first Task tag.
In that task retrieve the values of the Name, Location and Arguments tags.
Then open the RunWhen tag and retrieve the values of the Time and Date tags.
After that open the Days tag and retrieve the value of each individual tag within.
Retrieve the value of Enabled.
Load the next task and repeat steps 3 -> 7 until all the Task tags in Tasks have been parsed.
I'm very sure you can do it this way I just can't work it out as there are so many different ways to do things in XML I got a bit overwhelmed. But what I've go so far is that I would most likely be using XPathDocument and XPathNodeIterator right?
If someone can show me an example or explain to me how this would be done I would be very happy.

Easy way to parse the xml is to use the LINQ to XML
for example you have the following xml file
<library>
<track id="1" genre="Rap" time="3:24">
<name>Who We Be RMX (feat. 2Pac)</name>
<artist>DMX</artist>
<album>The Dogz Mixtape: Who's Next?!</album>
</track>
<track id="2" genre="Rap" time="5:06">
<name>Angel (ft. Regina Bell)</name>
<artist>DMX</artist>
<album>...And Then There Was X</album>
</track>
<track id="3" genre="Break Beat" time="6:16">
<name>Dreaming Your Dreams</name>
<artist>Hybrid</artist>
<album>Wide Angle</album>
</track>
<track id="4" genre="Break Beat" time="9:38">
<name>Finished Symphony</name>
<artist>Hybrid</artist>
<album>Wide Angle</album>
</track>
<library>
For reading this file, you can use the following code:
public void Read(string fileName)
{
XDocument doc = XDocument.Load(fileName);
foreach (XElement el in doc.Root.Elements())
{
Console.WriteLine("{0} {1}", el.Name, el.Attribute("id").Value);
Console.WriteLine(" Attributes:");
foreach (XAttribute attr in el.Attributes())
Console.WriteLine(" {0}", attr);
Console.WriteLine(" Elements:");
foreach (XElement element in el.Elements())
Console.WriteLine(" {0}: {1}", element.Name, element.Value);
}
}

I usually use XmlDocument for this. The interface is pretty straight forward:
var doc = new XmlDocument();
doc.LoadXml(xmlString);
You can access nodes similar to a dictionary:
var tasks = doc["Tasks"];
and loop over all children of a node.

Try XmlSerialization
try this
[Serializable]
public class Task
{
public string Name{get; set;}
public string Location {get; set;}
public string Arguments {get; set;}
public DateTime RunWhen {get; set;}
}
public void WriteXMl(Task task)
{
XmlSerializer serializer;
serializer = new XmlSerializer(typeof(Task));
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream, Encoding.Unicode);
serializer.Serialize(writer, task);
int count = (int)stream.Length;
byte[] arr = new byte[count];
stream.Seek(0, SeekOrigin.Begin);
stream.Read(arr, 0, count);
using (BinaryWriter binWriter=new BinaryWriter(File.Open(#"C:\Temp\Task.xml", FileMode.Create)))
{
binWriter.Write(arr);
}
}
public Task GetTask()
{
StreamReader stream = new StreamReader(#"C:\Temp\Task.xml", Encoding.Unicode);
return (Task)serializer.Deserialize(stream);
}

Are you familiar with the DataSet class?
The DataSet can also load XML documents and you may find it easier to iterate.
http://msdn.microsoft.com/en-us/library/system.data.dataset.readxml.aspx
DataSet dt = new DataSet();
dt.ReadXml(#"c:\test.xml");

class Program
{
static void Main(string[] args)
{
//Load XML from local
string sourceFileName="";
string element=string.Empty;
var FolderPath=#"D:\Test\RenameFileWithXmlAttribute";
string[] files = Directory.GetFiles(FolderPath, "*.xml");
foreach (string xmlfile in files)
{
try
{
sourceFileName = xmlfile;
XElement xele = XElement.Load(sourceFileName);
string convertToString = xele.ToString();
XElement parseXML = XElement.Parse(convertToString);
element = parseXML.Descendants("Meta").Where(x => (string)x.Attribute("name") == "XMLTAG").Last().Value;
DirectoryInfo CurrentDate = Directory.CreateDirectory(DateTime.Now.ToString("yyyy-MM-dd"));
string saveWithThisName= Path.Combine(CurrentDate.FullName, element);
File.Copy(sourceFileName, saveWithThisName,true);
}
catch(Exception ex)
{
}
}
}
}

Related

Converting very large files from xml to csv

Currently I'm using the following code snippet to convert a .txt file with XML data to .CSV format. My question is this, currently this works perfectly with files that are around 100-200 mbs and the conversion time is very low (1-2 minutes max), However I now need this to work for much bigger files (1-2 GB's each file). Currently the program freezes the computer and the conversion takes about 30-40 minutes with this function. Not sure how I would proceed changing this function. Any help will be appreciated!
string all_lines = File.ReadAllText(p);
all_lines = "<Root>" + all_lines + "</Root>";
XmlDocument doc_all = new XmlDocument();
doc_all.LoadXml(all_lines);
StreamWriter write_all = new StreamWriter(FILENAME1);
XmlNodeList rows_all = doc_all.GetElementsByTagName("XML");
foreach (XmlNode rowtemp in rows_all)
{
List<string> children_all = new List<string>();
foreach (XmlNode childtemp in rowtemp.ChildNodes)
{
children_all.Add(Regex.Replace(childtemp.InnerText, "\\s+", " "));
}
write_all.WriteLine(string.Join(",", children_all.ToArray()));
}
write_all.Flush();
write_all.Close();
Sample Input::
<XML><DSTATUS>1,4,7,,5</DSTATUS><EVENT> hello,there,my,name,is,jack,</EVENT>
last,name,missing,above <ANOTHERTAG>3,6,7,,8,4</ANOTHERTAG> </XML>
<XML><DSTATUS>1,5,7,,3</DSTATUS><EVENT>hello,there,my,name,is,mary,jane</EVENT>
last,name,not,missing,above<ANOTHERTAG>3,6,7,,8,4</ANOTHERTAG></XML>
Sample Output::
1,4,7,,5,hello,there,my,name,is,jack,,last,name,missing,above,3,6,7,,8,4
1,5,7,,3,hello,there,my,name,is,mary,jane,last,name,not,missing,above,3,6,7,,8,4

You need to take a streaming approach, as you're currently reading the entire 2Gb file into memory and then processing it. You should read a bit of XML, write a bit of CSV and keep doing that until you've processed it all.
A possible solution is below:
using (var writer = new StreamWriter(FILENAME1))
{
foreach (var element in StreamElements(r, "XML"))
{
var values = element.DescendantNodes()
.OfType<XText>()
.Select(e => Regex.Replace(e.Value, "\\s+", " "));
var line = string.Join(",", values);
writer.WriteLine(line);
}
}
Where StreamElements is inspired by Jon Skeet's streaming of XElements from an XmlReader in an answer to this question. I've made some changes to support your 'invalid' XML (as you have no root element):
private static IEnumerable<XElement> StreamElements(string fileName, string elementName)
{
var settings = new XmlReaderSettings
{
ConformanceLevel = ConformanceLevel.Fragment
};
using (XmlReader reader = XmlReader.Create(fileName, settings))
{
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element)
{
if (reader.Name == elementName)
{
var el = XNode.ReadFrom(reader) as XElement;
if (el != null)
{
yield return el;
}
}
}
}
}
}

If you're prepared to consider a completely different way of doing it, download Saxon-EE 9.6, get an evaluation license, and run the following streaming XSLT 3.0 code:
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template name="main">
<xsl:stream href="input.xml">
<xsl:for-each select="*/*">
<xsl:value-of select="*!normalize-space()" separator=","/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:stream>
</xsl:template>
</xsl:stylesheet>

It freezes because of File.ReadAllText(p);
Do not read the complete file into memory. (This will first start swapping, then halt your CPU because no more memory is available)
Use a chunking approach: Read line by line, convert line by line, write line by line.
Use some lower level XML Reader class, not XmlDocument

There are two variants. First is to hide program-freeze, use BackgroundWorker for it.
Second: read your text file string-by-string, use any Reader for it (Xml or any text\file).
You can combine these variants.

Xml gets corrupted each time I append a node

I have an Xml file as:
<?xml version="1.0"?>
<hashnotes>
<hashtags>
<hashtag>#birthday</hashtag>
<hashtag>#meeting</hashtag>
<hashtag>#anniversary</hashtag>
</hashtags>
<lastid>0</lastid>
<Settings>
<Font>Arial</Font>
<HashtagColor>red</HashtagColor>
<passwordset>0</passwordset>
<password></password>
</Settings>
</hashnotes>
I then call a function to add a node in the xml,
The function is :
public static void CreateNoteNodeInXDocument(XDocument argXmlDoc, string argNoteText)
{
string lastId=((Convert.ToInt32(argXmlDoc.Root.Element("lastid").Value)) +1).ToString();
string date = DateTime.Now.ToString("MM/dd/yyyy");
argXmlDoc.Element("hashnotes").Add(new XElement("Note", new XAttribute("ID", lastId), new XAttribute("Date",date),new XElement("Text", argNoteText)));
//argXmlDoc.Root.Note.Add new XElement("Text", argNoteText)
List<string> hashtagList = Utilities.GetHashtagsFromText(argNoteText);
XElement reqNoteElement = (from xml2 in argXmlDoc.Descendants("Note")
where xml2.Attribute("ID").Value == lastId
select xml2).FirstOrDefault();
if (reqNoteElement != null)
{
foreach (string hashTag in hashtagList)
{
reqNoteElement.Add(new XElement("hashtag", hashTag));
}
}
argXmlDoc.Root.Element("lastid").Value = lastId;
}
After this I save the xml.
Next time when I try to load the Xml, it fails with an exception:
System.Xml.XmlException: Unexpected XML declaration. The XML declaration must be the first node in the document, and no white space characters are allowed to appear before it.
Here is the code to load the XML:
private static XDocument hashNotesXDocument;
private static Stream hashNotesStream;
StorageFile hashNoteXml = await InstallationFolder.GetFileAsync("hashnotes.xml");
hashNotesStream = await hashNoteXml.OpenStreamForWriteAsync();
hashNotesXDocument = XDocument.Load(hashNotesStream);
and I save it using:
hashNotesXDocument.Save(hashNotesStream);

You don't show all of your code, but it looks like you open the XML file, read the XML from it into an XDocument, edit the XDocument in memory, then write back to the opened stream. Since the stream is still open it will be positioned at the end of the file and thus the new XML will be appended to the file.
Suggest eliminating hashNotesXDocument and hashNotesStream as static variables, and instead open and read the file, modify the XDocument, then open and write the file using the pattern shown here.
I'm working only on desktop code (using an older version of .Net) so I can't test this, but something like the following should work:
static async Task LoadUpdateAndSaveXml(Action<XDocument> editor)
{
XDocument doc;
var xmlFile = await InstallationFolder.GetFileAsync("hashnotes.xml");
using (var reader = new StreamReader(await xmlFile.OpenStreamForReadAsync()))
{
doc = XDocument.Load(reader);
}
if (doc != null)
{
editor(doc);
using (var writer = new StreamWriter(await xmlFile.OpenStreamForWriteAsync()))
{
// Truncate - https://stackoverflow.com/questions/13454584/writing-a-shorter-stream-to-a-storagefile
if (writer.CanSeek && writer.Length > 0)
writer.SetLength(0);
doc.Save(writer);
}
}
}
Also, be sure to create the file before using it.

Merging huge (2GB) XMLs in memory (without any memory exceptions)

I would like a C# code that optimally appends 2 XML strings. Both of them are of same schema. I tried StreamReader / StreamWriter; File.WriteAllText; FileStream
The problem I see is, it uses more than 98% of physical memory thus results in out of memory exception.
Is there a way to optimally merge without getting any memory exceptions? Time is not a concern for me.
If making it available in memory is going to be a problem, then what else could be better? Saving it on File system?
Further Details:
Here is my simple program: to provide better detail
static void Main(string[] args)
{
Program p = new Program();
XmlDocument x1 = new XmlDocument();
XmlDocument x2 = new XmlDocument();
x1.Load("C:\\XMLFiles\\1.xml");
x2.Load("C:\\XMLFiles\\2.xml");
List<string> files = new List<string>();
files.Add("C:\\XMLFiles\\1.xml");
files.Add("C:\\XMLFiles\\2.xml");
p.ConsolidateFiles(files, "C:\\XMLFiles\\Result.xml");
p.MergeFiles("C:\\XMLFiles\\Result.xml", x1.OuterXml, x2.OuterXml, "<Data>", "</Data>");
Console.ReadLine();
}
public void ConsolidateFiles(List<String> files, string outputFile)
{
var output = new StreamWriter(File.Open(outputFile, FileMode.Create));
output.WriteLine("<Data>");
foreach (var file in files)
{
var input = new StreamReader(File.Open(file, FileMode.Open));
string line;
while (!input.EndOfStream)
{
line = input.ReadLine();
if (!line.Contains("<Data>") &&
!line.Contains("</Data>"))
{
output.Write(line);
}
}
}
output.WriteLine("</Data>");
}
public void MergeFiles(string outputPath, string xmlState, string xmlFederal, string prefix, string suffix)
{
File.WriteAllText(outputPath, prefix);
File.AppendAllText(outputPath, xmlState);
File.AppendAllText(outputPath, xmlFederal);
File.AppendAllText(outputPath, suffix);
}
XML Sample:
<Data> </Data> is appended at the beginning & End
XML 1: <Sections> <Section></Section> </Sections>
XML 2: <Sections> <Section></Section> </Sections>
Merged: <Data> <Sections> <Section></Section> </Sections> <Sections> <Section></Section> </Sections> </Data>

Try this, a stream based approach which avoids loading all the xml into memory at once.
static void Main(string[] args)
{
List<string> files = new List<string>();
files.Add("C:\\XMLFiles\\1.xml");
files.Add("C:\\XMLFiles\\2.xml");
ConsolidateFiles(files, "C:\\XMLFiles\\Result.xml");
Console.ReadLine();
}
private static void ConsolidateFiles(List<String> files, string outputFile)
{
using (var output = new StreamWriter(outputFile))
{
output.WriteLine("<Data>");
foreach (var file in files)
{
using (var input = new StreamReader(file, FileMode.Open))
{
while (!input.EndOfStream)
{
string line = input.ReadLine();
if (!line.Contains("<Data>") &&
!line.Contains("</Data>"))
{
output.Write(line);
}
}
}
}
output.WriteLine("</Data>");
}
}
An even better approach is to use XmlReader (http://msdn.microsoft.com/en-us/library/system.xml.xmlreader(v=vs.90).aspx). This will give you a stream reader designed specifically for xml, rather than StreamReader which is for reading general text.

Take a look here
The answer given by Teoman Soygul seems to be what you're looking for.

This is untested, but I would do something along these lines using TextReader and TextWriter. You do not want to read all of the XML text into memory or store it in a string, and you do not want to use XElement/XDocument/etc. anywhere in the middle.
using (var writer = new XmlTextWriter("ResultFile.xml")
{
writer.WriteStartDocument();
writer.WriteStartElement("Data");
using (var reader = new XmlTextReader("XmlFile1.xml")
{
reader.Read();
while (reader.Read())
{
writer.WriteNode(reader, true);
}
}
using (var reader = new XmlTextReader("XmlFile2.xml")
{
reader.Read();
while (reader.Read())
{
writer.WriteNode(reader, true);
}
}
writer.WriteEndElement("Data");
}
Again no guarantees that this exact code will work as-is (or that it even compiles), but I think that is the idea you're looking for. Stream data from File1 first and write it directly out to the result file. Then, stream data from File2 and write it out. At no point should a full XML file be in memory.

If you run on 64bit, try this: go to your project properties -> build tab -> Platform target: change "Any CPU" to "x64".
This solved my problem for loading huge XML files in memory.

you have to go to file system, unless you have lots of RAM
one simple approach:
File.WriteAllText("output.xml", "<Data>");
File.AppendAllText("output.xml", File.ReadAllText("xml1.xml"));
File.AppendAllText("output.xml", File.ReadAllText("xml2.xml"));
File.AppendAllText("output.xml", "</Data>");
another:
var fNames = new[] { "xml1.xml", "xml2.xml" };
string line;
using (var writer = new StreamWriter("output.xml"))
{
writer.WriteLine("<Data>");
foreach (var fName in fNames)
{
using (var file = new System.IO.StreamReader(fName))
{
while ((line = file.ReadLine()) != null)
{
writer.WriteLine(line);
}
}
}
writer.WriteLine("</Data>");
}
All of this with the premise that there is not schema, or tags inside xml1.xml and xml2.xml
If that is the case, just code to omit them.

C# - From XML to Database

I got an XML file which can have several nodes, child nodes, "child child nodes", ... and I'd like to figure out how to get these data in order to store them into my own SQL Server database.
I've read some tutos on internet and also tried some things. At the current moment, I'm able to open and read the file but not to retrieve data. Here's what I'm doing for instance :
class Program
{
static void Main(string[] args)
{
Person p = new Person();
string filePath = #"C:\Users\Desktop\ConsoleApplication1\XmlPersonTest.xml";
XmlDocument xmlDoc = new XmlDocument();
if(File.Exists(filePath))
{
xmlDoc.Load(filePath);
XmlElement elm = xmlDoc.DocumentElement;
XmlNodeList list = elm.ChildNodes;
Console.WriteLine("The root element contains {0} nodes",
list.Count);
}
else
{
Console.WriteLine("The file {0} could not be located",
filePath);
}
Console.Read();
}
}
And here's a small example of what my XML file looks like :
<person>
<name>McMannus</name>
<firstname>Fionn</firstname>
<age>21</age>
<nationality>Belge</nationality>
<car>
<mark>Audi</mark>
<model>A1</model>
<year>2013</year>
<hp>70</hp>
</car>
<car>
<mark>VW</mark>
<model>Golf 7</model>
<year>2014</year>
<hp>99</hp>
</car>
<car>
<mark>BMW</mark>
<model>Série 1</model>
<year>2013</year>
<hp>80</hp>
</car>
</person>
Any advice or tuto to do that guys?

I have made a little method for navigating through xml nodes, using XElement (Linq.Xml):
public string Get(XElement root, string path)
{
if (root== null)
return null;
string[] p = path.Split(new string[] { "/" }, StringSplitOptions.RemoveEmptyEntries);
XElement at = root;
foreach (string n in p)
{
at = at.Element(n);
if (at == null)
return null;
}
return at.Value;
}
Using this, you can get the value of an XElement node via Get(root, "rootNode/nodeA/nodeAChild/etc")

Well, having gone through something similar the other day. You should try the following, initially build a model:
Open your XML Document.
Copy your entire XML Document.
Open Visual Studio.
Click in an area out of your initial class (1b diagram)
Go to Edit in Visual Studio
Paste Special - Paste as XML Classes
1b:
namespace APICore
{
public class APIParser()
{
// Parse logic would go here.
}
// You would click here.
}
When you do that you'll end up with a valid XML Model, which can be accessed through your parser, how you choose to access the XML Web or Local will be up to you. For simplicity I'm going to choose a file:
public class APIParser(string file)
{
// Person should be Xml Root Element Class.
XmlSerializer serialize = new XmlSerializer(typeof(Person));
using(FileStream stream = new FileStream(file, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite))
using(XmlReader reader XmlReader.Create(stream))
{
Person model = serialize.Deserialize(reader) as Person;
}
}
So now you've successfully got the data to iterate through, so you can work with your data. Here is an example of how you would:
// Iterates through each Person
foreach(var people in model.Person)
{
var information = people.Cars.SelectMany(obj => new { obj.Mark, obj.model, obj.year, obj.hp }).ToList();
}
You would do something like that, then write to the database. This won't fit your example perfectly but should point you in a strong direction.

What is the most efficient way to take XML from API and store it locally?

I am trying to find the fastest way to read XML from the merriam webster dictionary, and store it to a local file for later use. Below, I try to implement a module which does a few things:
Read 2000 words from a local directory
Look up each of the words in the merriam dictionary using the API
Store the definition(s) in a local XML for later use.
Im not sure if making an XML is the best way to store this data, but it seemed like the simplest thing to do. At first, I thought I would do it in different steps. (1. Look up word, store word and definitions into data structure. 2. Dump all data into XML.) However, this poses a problem, because it just too much stuff to store on the runtime(call) stack.
So, in this scenario, I try to speed things up by looking up each word and then saving it to the xml one by one. This, however, is also a slow method. Its taking me up around 10 minutes per 500-600 words.
public void load_module() // stores words/definitions into xml file
{ // 1. Pick up word from text file 2. Look up word's definition 3. Store in Xml
string workdirect = Directory.GetCurrentDirectory();
workdirect = workdirect.Substring(0, workdirect.LastIndexOf("bin"));
workdirect += "words1.txt";
using (StreamReader read = new StreamReader(workdirect)) // 1. Pick up word from text file
{
while (!read.EndOfStream)
{
string line = read.ReadLine();
var definitions = load(line.ToLower()); // 2. Retrieve Words Definitions
store_xml(line, definitions);
wordlist.Add(line);
}
}
}
public List<string> load(string word)
{
XmlDocument doc = new XmlDocument();
List<string> definitions = new List<string>();
XmlNodeList node = null;
doc.Load("http://www.dictionaryapi.com/api/v1/references/collegiate/xml/"+word+"?key=*****************"); // Asteriks to hide the actual API key
if (doc.SelectSingleNode("entry_list").SelectSingleNode("entry").SelectSingleNode("def") == null)
{
return definitions;
}
node = doc.SelectSingleNode("entry_list").SelectSingleNode("entry").SelectSingleNode("def").SelectNodes("dt");
// TO DO : implement definitions if there is no node "def" in first node entry "entry_list"
foreach (XmlNode item in node)
{
definitions.Add(item.InnerXml.ToString().ToLower());
}
return definitions;
}
public void store_xml(string word, List<string> definitions)
{
string local = Directory.GetCurrentDirectory();
string name = "dictionary_word.xml";
local = local.Substring(0, local.LastIndexOf("bin"));
bool exists = File.Exists(local + name);
if (exists)
{
XmlDocument doc = new XmlDocument();
doc.Load(local + name);
XmlElement wordindoc = doc.CreateElement("Word");
wordindoc.SetAttribute("xmlns", word);
XmlElement defs = doc.CreateElement("Definitions");
foreach (var item in definitions)
{
XmlElement def = doc.CreateElement("Definition");
def.InnerText = item;
defs.AppendChild(def);
}
wordindoc.AppendChild(defs);
doc.DocumentElement.AppendChild(wordindoc);
doc.Save(local+name);
}
else
{
using (XmlWriter writer = XmlWriter.Create(#local + name))
{
writer.WriteStartDocument();
writer.WriteStartElement("Dictionary");
writer.WriteStartElement("Word", word);
writer.WriteStartElement("Definitions");
foreach (var def in definitions)
{
writer.WriteElementString("Definition", def);
}
writer.WriteEndElement();
writer.WriteEndElement();
writer.WriteEndElement();
writer.WriteEndDocument();
}
}
}
}

When handling large amounts of data that need to be exported to XML, I would normally keep the data in memory as a collection of custom objects rather than as an XMLDocument:
public class Definition
{
public string Word { get; set; }
public string Definition { get; set; }
}
I would then use XMLWriter to write the collection to the XML file:
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.IndentChars = (" ");
settings.Encoding = Encoding.UTF8;
using (XmlWriter writer = XmlWriter.Create("C:\output\output.xml", settings))
{
writer.WriteStartDocument();
// TODO - use XMLWriter functions to write out each word and definition
writer.Flush();
}
If you are still short on memory, you might be able to write out the XML in batches (e.g. every 500 definitions).
I found the Microsoft article on Improving XML Performance a very useful reference, particularly the section on Design Considerations.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

XML Parsing - Read a Simple XML File and Retrieve Values - c#

I usually use XmlDocument for this. The interface is pretty straight forward: var doc = new XmlDocument(); doc.LoadXml(xmlString); You can access nodes similar to a dictionary: var tasks = doc["Tasks"]; and loop over all children of a node.

Are you familiar with the DataSet class? The DataSet can also load XML documents and you may find it easier to iterate. http://msdn.microsoft.com/en-us/library/system.data.dataset.readxml.aspx DataSet dt = new DataSet(); dt.ReadXml(#"c:\test.xml");

Related

Converting very large files from xml to csv

Xml gets corrupted each time I append a node

Merging huge (2GB) XMLs in memory (without any memory exceptions)

C# - From XML to Database

What is the most efficient way to take XML from API and store it locally?

Categories

Resources