how to handle large xml files

how to handle large xml files - c#

public XmlNodeList GetNodes(string _logFilePath, string _strXPathQuery)
{
objXmlDoc = new XmlDocument();
objXmlDoc.Load(_logFilePath);
XmlNodeList objxmlNodeList = objXmlDoc.SelectNodes(_strXPathQuery);
return objxmlNodeList;
}
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<AppXmlLogWritter>
<LogData>
<LogID>999992013021110381000001</LogID>
<LogDateTime>20130211103810</LogDateTime>
<LogType>Message</LogType>
<LogFlag>Flag</LogFlag>
<LogApplication>Application</LogApplication>
<LogModule>Module</LogModule>
<LogLocation>Location</LogLocation>
<LogText>Text</LogText>
<LogStackTrace>Stacktrace</LogStackTrace>
</LogData>
</AppXmlLogWritter>
Here Xml file size is 1000MB when i load it into xmlDocument object then gives me an OutOf memory exception.because XMLDocument stores nodes into Memory .I use Xpath query to filter the nodes throughtout xml file.And then bind to listview to dispaly nodes.
I read articles regarding how TO HANDLE LARGE XML FILES they told me use XpathQuery.
but the problem doesnt solve.
what about filestream? or any other idea to load large xml files?*

You could write a method that will use a XmlReader to read the large XML file in chunks.
Start by designing a model that will hold the data:
public class LogData
{
public string LogID { get; set; }
public string LogDateTime { get; set; }
public string LogType { get; set; }
...
}
and then a method that will parse the XML file:
public static IEnumerable<LogData> GetLogData(XmlReader reader)
{
LogData logData = null;
while (reader.Read())
{
if (reader.IsStartElement("LogData"))
{
logData = new LogData();
}
if (reader.Name == "LogData" && reader.NodeType == XmlNodeType.EndElement)
{
yield return logData;
}
if (reader.Name == "LogID")
{
logData.LogID = reader.ReadElementContentAsString();
}
else if (reader.Name == "LogDateTime")
{
logData.LogDateTime = reader.ReadElementContentAsString();
}
else if (reader.Name == "LogType")
{
logData.LogType = reader.ReadElementContentAsString();
}
...
}
}
Now you could load only the elements you want to display. For example:
using (var reader = XmlReader.Create("someHugeFile.xml"))
{
IEnumerable<LogData> data = GetLogData(reader).Skip(10).Take(5);
// Go ahead and bind the data to your UI
}
Another thing you might want to know is how many records do you have in total in your XML file in order to implement pagination effectively. This could be done with another method:
public static int GetTotalLogEntries(XmlReader reader)
{
var count = 0;
while (reader.Read())
{
if (reader.IsStartElement("LogData"))
{
count++;
}
}
return count;
}

I suggest that you compress the xml message before you send them to stream then decompress it once you received it in the other end. In WCF you can do it like this http://www.codeproject.com/Articles/165844/WCF-Client-Server-Application-with-Custom-Authenti#Compression

Related

Different element serialization styles for empty strings [duplicate]

When I serialize the value : If there is no value present in for data then it's coming like below format.
<Note>
<Type>Acknowledged by PPS</Type>
<Data />
</Note>
But what I want xml data in below format:
<Note>
<Type>Acknowledged by PPS</Type>
<Data></Data>
</Note>
Code For this i have written :
[Serializable]
public class Notes
{
[XmlElement("Type")]
public string typeName { get; set; }
[XmlElement("Data")]
public string dataValue { get; set; }
}
I am not able to figure out what to do for achieve data in below format if data has n't assign any value.
<Note>
<Type>Acknowledged by PPS</Type>
<Data></Data>
</Note>

You can do this by creating your own XmlTextWriter to pass into the serialization process.
public class MyXmlTextWriter : XmlTextWriter
{
public MyXmlTextWriter(Stream stream) : base(stream, Encoding.UTF8)
{
}
public override void WriteEndElement()
{
base.WriteFullEndElement();
}
}
You can test the result using:
class Program
{
static void Main(string[] args)
{
using (var stream = new MemoryStream())
{
var serializer = new XmlSerializer(typeof(Notes));
var writer = new MyXmlTextWriter(stream);
serializer.Serialize(writer, new Notes() { typeName = "Acknowledged by PPS", dataValue="" });
var result = Encoding.UTF8.GetString(stream.ToArray());
Console.WriteLine(result);
}
Console.ReadKey();
}

If you saved your string somewhere (e.g a file) you can use this simple Regex.Replace:
var replaced = Regex.Replace(File.ReadAllText(name), #"<([^<>/]+)\/>", (m) => $"<{m.Groups[1].Value.Trim()}></{m.Groups[1].Value.Trim()}>");
File.WriteAllText(name, replaced);

IMO it's not possibe to generate your desired XML using Serialization. But, you can use LINQ to XML to generate the desired schema like this -
XDocument xDocument = new XDocument();
XElement rootNode = new XElement(typeof(Notes).Name);
foreach (var property in typeof(Notes).GetProperties())
{
if (property.GetValue(a, null) == null)
{
property.SetValue(a, string.Empty, null);
}
XElement childNode = new XElement(property.Name, property.GetValue(a, null));
rootNode.Add(childNode);
}
xDocument.Add(rootNode);
XmlWriterSettings xws = new XmlWriterSettings() { Indent=true };
using (XmlWriter writer = XmlWriter.Create("D:\\Sample.xml", xws))
{
xDocument.Save(writer);
}
Main catch is in case your value is null, you should set it to empty string. It will force the closing tag to be generated. In case value is null closing tag is not created.

Kludge time - see Generate System.Xml.XmlDocument.OuterXml() output thats valid in HTML
Basically after XML doc has been generated go through each node, adding an empty text node if no children
// Call with
addSpaceToEmptyNodes(xmlDoc.FirstChild);
private void addSpaceToEmptyNodes(XmlNode node)
{
if (node.HasChildNodes)
{
foreach (XmlNode child in node.ChildNodes)
addSpaceToEmptyNodes(child);
}
else
node.AppendChild(node.OwnerDocument.CreateTextNode(""))
}
(Yes I know you shouldn't have to do this - but if your sending the XML to some other system that you can't easily fix then have to be pragmatic about things)

You can add a dummy field to prevent the self-closing element.
[XmlText]
public string datavalue= " ";
Or if you want the code for your class then Your class should be like this.
public class Notes
{
[XmlElement("Type")]
public string typeName { get; set; }
[XmlElement("Data")]
private string _dataValue;
public string dataValue {
get {
if(string.IsNullOrEmpty(_dataValue))
return " ";
else
return _dataValue;
}
set {
_dataValue = value;
}
}
}

In principal, armen.shimoon's answer worked for me. But if you want your XML output pretty printed without having to use XmlWriterSettings and an additional Stream object (as stated in the comments), you can simply set the Formatting in the constructor of your XmlTextWriter class.
public MyXmlTextWriter(string filename) : base(filename, Encoding.UTF8)
{
this.Formatting = Formatting.Indented;
}
(Would have posted this as a comment but am not allowed yet ;-))

Effectively the same as Ryan's solution which uses the standard XmlWriter (i.e. there's no need for a derived XmlTextWriter class), but written using linq to xml (XDocument)..
private static void AssignEmptyElements(this XNode node)
{
if (node is XElement e)
{
e.Nodes().ToList().ForEach(AssignEmptyElements);
if (e.IsEmpty)
e.Value = string.Empty;
}
}
usage..
AssignEmptyElements(document.FirstNode);

Read nodes of a xml file in C#

How can I read the following xml file into a List:
Partial XML file (data.log)
<ApplicationLogEventObject>
<EventType>Message</EventType>
<DateStamp>10/13/2016 11:15:00 AM</DateStamp>
<ShortDescription>N/A</ShortDescription>
<LongDescription>Sending 'required orders' email.</LongDescription>
</ApplicationLogEventObject>
<ApplicationLogEventObject>
<EventType>Message</EventType>
<DateStamp>10/13/2016 11:15:10 AM</DateStamp>
<ShortDescription>N/A</ShortDescription>
<LongDescription>Branches Not Placed Orders - 1018</LongDescription>
</ApplicationLogEventObject>
<ApplicationLogEventObject>
<EventType>Message</EventType>
<DateStamp>10/13/2016 11:15:10 AM</DateStamp>
<ShortDescription>N/A</ShortDescription>
<LongDescription>Branches Not Placed Orders - 1019</LongDescription>
</ApplicationLogEventObject>
...
And here is the data access layer (DAL):
public List<FLM.DataTypes.ApplicationLogEventObject> Get()
{
try
{
XmlTextReader xmlTextReader = new XmlTextReader(#"C:\data.log");
List<FLM.DataTypes.ApplicationLogEventObject> recordSet = new List<ApplicationLogEventObject>();
xmlTextReader.Read();
while (xmlTextReader.Read())
{
xmlTextReader.MoveToElement();
FLM.DataTypes.ApplicationLogEventObject record = new ApplicationLogEventObject();
record.EventType = xmlTextReader.GetAttribute("EventType").ToString();
record.DateStamp = Convert.ToDateTime(xmlTextReader.GetAttribute("DateStamp"));
record.ShortDescription = xmlTextReader.GetAttribute("ShortDescription").ToString()
record.LongDescription = xmlTextReader.GetAttribute("LongDescription").ToString();
recordSet.Add(record);
}
return recordSet;
}
catch (Exception ex)
{
throw ex;
}
}
And the Data Types which will hold the child elements from the XML file:
public class ApplicationLogEventObject
{
public string EventType { get; set; }
public DateTime DateStamp { get; set; }
public string ShortDescription { get; set; }
public string LongDescription { get; set; }
}
After I've read the child nodes into a List I would then like to return it and display it in a DataGridView.
Any help regarding this question will be much appreciated.

Your log file is not an XML document. Since an XML document must have one and only one root element, it's a series of XML documents concatenated together. Such a series of documents can be read by XmlReader by setting XmlReaderSettings.ConformanceLevel == ConformanceLevel.Fragment. Having done so, you can read through the file and deserialize each root element individually using XmlSerializer as follows:
static List<ApplicationLogEventObject> ReadEvents(string fileName)
{
return ReadObjects<ApplicationLogEventObject>(fileName);
}
static List<T> ReadObjects<T>(string fileName)
{
var list = new List<T>();
var serializer = new XmlSerializer(typeof(T));
var settings = new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment };
using (var textReader = new StreamReader(fileName))
using (var xmlTextReader = XmlReader.Create(textReader, settings))
{
while (xmlTextReader.Read())
{ // Skip whitespace
if (xmlTextReader.NodeType == XmlNodeType.Element)
{
using (var subReader = xmlTextReader.ReadSubtree())
{
var logEvent = (T)serializer.Deserialize(subReader);
list.Add(logEvent);
}
}
}
}
return list;
}
Using the following version of ApplicationLogEventObject:
public class ApplicationLogEventObject
{
public string EventType { get; set; }
[XmlElement("DateStamp")]
public string DateStampString {
get
{
// Replace with culturally invariant desired formatting.
return DateStamp.ToString(CultureInfo.InvariantCulture);
}
set
{
DateStamp = Convert.ToDateTime(value, CultureInfo.InvariantCulture);
}
}
[XmlIgnore]
public DateTime DateStamp { get; set; }
public string ShortDescription { get; set; }
public string LongDescription { get; set; }
}
Sample .Net fiddle.
Notes:
The <DateStamp> element values 10/13/2016 11:15:00 AM are not in the correct format for dates and times in XML, which is ISO 8601. Thus I introduced a surrogate string DateStampString property to manually handle the conversion from and to your desired format, and then marked the original DateTime property with XmlIgnore.
Using ReadSubtree() prevents the possibility of reading past the end of each root element when the XML is not indented.
According to the documentation for XmlTextReader:
Starting with the .NET Framework 2.0, we recommend that you use the System.Xml.XmlReader class instead.
Thus I recommend replacing use of that type with XmlReader.
The child nodes of your <ApplicationLogEventObject> are elements not attributes, so XmlReader.GetAttribute() was not an appropriate method to use to read them.
Given that your log files are not formatting their times in ISO 8601, you should at least make sure they are formatted in a culturally invariant format so that log files can be exchanged between computers with different regional settings. Doing your conversions using CultureInfo.InvariantCulture ensures this.

How to fix missing namespace prefix in XML

I have an XML file which I can't modify by myself. It contains the following root element:
<foo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" noNamespaceSchemaLocation="some.xsd">
As you can see the prefix xsi: is missing for noNamespaceSchemaLocation. This causes the XmlReader to not find the schema information while validating. If I add the prefix all is good. But as I said I can't modify the XML file (besides testing). I get them from an external source and my tool should automatically validate them.
Is there a possibility to make the XmlReader interpret noNamespaceSchemaLocation without the xsi: prefix? I don't want to add the prefix inside the XML in a preprocessing step or something like that as the sources should exactly remain as they are.

The XML is wrong and you need to fix it. Either get your supplier to improve the quality of what they send, or repair it on arrival.
I don't know why you want to retain the broken source (all quality standards say that's bad practice), but it's certainly possible to keep the broken original as well as the repaired version.

The internals of XmlReader are ugly and undocumented. So this solution is like playing with fire.
What I propose is: a XmlTextReader that "adds" the missing namespace. You can feed directly this FixingXmlTextReader to a XDocument.Load() OR you can feed it to a XmlTextReader/XmlValidatingReader (they all have a constructor/Create that accept a XmlReader as a parameter)
public class FixingXmlTextReader : XmlTextReader
{
public override string NamespaceURI
{
get
{
if (NodeType == XmlNodeType.Attribute && base.LocalName == "noNamespaceSchemaLocation")
{
return NameTable.Add("http://www.w3.org/2001/XMLSchema-instance");
}
return base.NamespaceURI;
}
}
public override string Prefix
{
get
{
if (NodeType == XmlNodeType.Attribute && base.NamespaceURI == string.Empty && base.LocalName == "noNamespaceSchemaLocation")
{
return NameTable.Add("xsi");
}
return base.Prefix;
}
}
public override string Name
{
get
{
if (NodeType == XmlNodeType.Attribute && base.NamespaceURI == string.Empty && base.LocalName == "noNamespaceSchemaLocation")
{
return NameTable.Add(Prefix + ":" + LocalName);
}
return base.Name;
}
}
public override string GetAttribute(string localName, string namespaceURI)
{
if (localName == "noNamespaceSchemaLocation" && namespaceURI == "http://www.w3.org/2001/XMLSchema-instance")
{
namespaceURI = string.Empty;
}
return base.GetAttribute(localName, namespaceURI);
}
public override string GetAttribute(string name)
{
if (name == "xsi:noNamespaceSchemaLocation")
{
name = "noNamespaceSchemaLocation";
}
return base.GetAttribute(name);
}
// There are tons of constructors, take the ones you need
public FixingXmlTextReader(Stream stream) : base(stream)
{
}
public FixingXmlTextReader(TextReader input) : base(input)
{
}
public FixingXmlTextReader(string url) : base(url)
{
}
}
Like:
using (var reader = new FixingXmlTextReader("XMLFile1.xml"))
using (var reader2 = XmlReader.Create(reader, new XmlReaderSettings
{
}))
{
// Use the reader2!
}
or
using (var reader = new FixingXmlTextReader("XMLFile1.xml"))
{
var xdoc = new XmlDocument();
xdoc.Load(reader);
}

How to handle end of file when reading xml file

So I am reading a xml file with unknown length and reading each element into a list structure. Right now once I get to the end of the file I continue reading, this causes an exception. Right now I just catch this exception and continue with my life but is there a cleaner way to do this?
try
{
while(!textReader.EOF)
{
// Used to store info from each command as they are read from the xml file
ATAPassThroughCommands command = new ATAPassThroughCommands ();
// the following is just commands being read and their contents being saved
XmlNodeType node = textReader.NodeType;
textReader.ReadStartElement( "Command" );
node = textReader.NodeType;
name = textReader.ReadElementString( "Name" );
node = textReader.NodeType;
CommandListContext.Add(name);
command.m_Name = name;
command.m_CMD = Convert .ToByte(textReader.ReadElementString("CMD" ),16);
command.m_Feature = Convert .ToByte(textReader.ReadElementString("Feature" ),16);
textReader.ReadEndElement(); //</command>
m_ATACommands.Add(command);
}
}
catch ( Exception ex)
{
//</ATAPassThrough> TODO: this is an ugly fix come up with something better later
textReader.ReadEndElement();
//cUtils.DisplayError(ex.Message);
}
xml file:
<ATAPassThrough>
<Command>
<Name>Smart</Name>
<CMD>B0</CMD>
<Feature>D0</Feature>
</Command>
<Command>
<Name>Identify</Name>
<CMD>B1</CMD>
<Feature>D0</Feature>
</Command>
.
.
.
.
</ATAPassThrough>

I would recomend using XDocument for reading XML data... for instance in your case since you already have a TextReader for your XML you can just pass that into the XDocument.Load method... your entire function above looks like this..
var doc = XDocument.Load(textReader);
foreach (var commandXml in doc.Descendants("Command"))
{
var command = new ATAPassThroughCommands();
var name = commandXml.Descendants("Name").Single().Value;
// I'm not sure what this does but it looks important...
CommandListContext.Add(name);
command.m_Name = name;
command.m_CMD =
Convert.ToByte(commandXml.Descendants("CMD").Single().Value, 16);
command.m_Feature =
Convert.ToByte(commandXml.Descendants("Feature").Single().Value, 16);
m_ATACommands.Add(command);
}
Significantly easier. Let the framework do the heavy lifting for you.

Probably the easiest way if you have normal and consistant XML is to use the XML Serializer.
First Create Objects that match your XML
[Serializable()]
public class Command
{
[System.Xml.Serialization.XmlElement("Name")]
public string Name { get; set; }
[System.Xml.Serialization.XmlElement("CMD")]
public string Cmd { get; set; }
[System.Xml.Serialization.XmlElement("Feature")]
public string Feature { get; set; }
}
[Serializable()]
[System.Xml.Serialization.XmlRoot("ATAPassthrough")]
public class CommandCollection
{
[XmlArrayItem("Command", typeof(Command))]
public Command[] Command { get; set; }
}
The a method to return the CommandCollection
public class CommandSerializer
{
public commands Deserialize(string path)
{
CommandCollection commands = null;
XmlSerializer serializer = new XmlSerializer(typeof(CommandCollection ));
StreamReader reader = new StreamReader(path);
reader.ReadToEnd();
commands = (CommandCollection)serializer.Deserialize(reader);
reader.Close();
return commands ;
}
}
Not sure if this is exactly correct, I don't have the means to test it, but is should be really close.

Using XMLReader to read large XML documents to parse the information into a class

I have been using XDocument combined with LINQ to XML to load in xml files and populate my class.
But now I am tasked with making sure my program can handle all sizes of XML documents which means i need to use XML Reader and at this time being i cant get my head around manipulating the XMLReader to populate my class.
currently i have the below class to populate:
public class DataRecord
{
private List<Fields> field = new List<Fields>();
public string ID { get; set; }
public string TotalLength { get; set; }
public List<Fields> MyProperty
{
get { return field; }
set { field = value; }
}
}
internal class Fields
{
public string name { get; set; }
public string startByte { get; set; }
public string length { get; set; }
}
}
I have been trying to switch statements to enforce the xmlreader to provide the data from me to populate the class. For example:
using (XmlReader reader = XmlReader.Create(filename))
{
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
switch (reader.Name)
{
case "DataRecord":
var dataaa = new dataclass.DataRecord();
break;
}
break;
}
}
}
But as i said this is an example, I have searched for ages to try and find an answer but I am getting confused. Hopefully someone can help we my problem.

You can use XmlReader to move through the document, but then load each element using XElement.
Here's a short example:
using System;
using System.Xml;
using System.Xml.Linq;
class Test
{
static void Main()
{
using (var reader = XmlReader.Create("test.xml"))
{
while (reader.ReadToFollowing("foo"))
{
XElement element = XElement.Load(reader.ReadSubtree());
Console.WriteLine("Title: {0}", element.Attribute("title").Value);
}
}
}
}
With sample XML:
<data>
<foo title="x" /><foo title="y">asd</foo> <foo title="z" />
</data>
(Slightly inconsistent just to show that it can handle elements with content, elements with no space between them, and elements with space between them.)
Then obviously in the loop you'd do whatever you need to with the XElement - if you've already got a way of creating an instance of your class from an XElement, you can just call that, use the object, and you're away.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

how to handle large xml files - c#

I suggest that you compress the xml message before you send them to stream then decompress it once you received it in the other end. In WCF you can do it like this http://www.codeproject.com/Articles/165844/WCF-Client-Server-Application-with-Custom-Authenti#Compression

Related

Different element serialization styles for empty strings [duplicate]

Read nodes of a xml file in C#

How to fix missing namespace prefix in XML

How to handle end of file when reading xml file

Using XMLReader to read large XML documents to parse the information into a class

Categories

Resources