Read entire elements from an XML network stream - c#

I am writing a network server in C# .NET 4.0. There is a network TCP/IP connection over which I can receive complete XML elements. They arrive regularly and I need to process them immediately. Each XML element is a complete XML document in itself, so it has an opening element, several sub-nodes and a closing element. There is no single root element for the entire stream. So when I open the connection, what I get is like this:
<status>
<x>123</x>
<y>456</y>
</status>
Then some time later it continues:
<status>
<x>234</x>
<y>567</y>
</status>
And so on. I need a way to read the complete XML string until a status element is complete. I don't want to do that with plain text reading methods because I don't know in what formatting the data arrives. I can in no way wait until the entire stream is finished, as is often described elsewhere. I have tried using the XmlReader class but its documentation is weird, the methods don't work out, the first element is lost and after sending the second element, an XmlException occurs because there are two root elements.

Try this:
var settings = new XmlReaderSettings
{
ConformanceLevel = ConformanceLevel.Fragment
};
using (var reader = XmlReader.Create(stream, settings))
{
while (!reader.EOF)
{
reader.MoveToContent();
var doc = XDocument.Load(reader.ReadSubtree());
Console.WriteLine("X={0}, Y={1}",
(int)doc.Root.Element("x"),
(int)doc.Root.Element("y"));
reader.ReadEndElement();
}
}

If you change the "conformance level" to "fragment", it might work with the XmlReader.
This is a (slightly modified) example from MSDN:
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Fragment;
XmlReader reader = XmlReader.Create(streamOfXmlFragments, settings);

You could use XElement.Load which is meant more for streaming of Xml Element fragments that is new in .net 3.5 and also supports reading directly from a stream.
Have a look at System.Xml.Linq
I think that you may well still have to add some control logic so as to partition the messages you are receiving, but you may as well give it a go.

I'm not sure there's anything built-in that does that.
I'd open a string builder, fill it until I see a </status> tag, and then parse it using the ordinary XmlDocument.

Not substantially different from dtb's solution, but linqier
static IEnumerable<XDocument> GetDocs(Stream xmlStream)
{
var xmlSettings = new XmlReaderSettings() { ConformanceLevel = ConformanceLevel.Fragment };
using (var xmlReader = XmlReader.Create(xmlStream, xmlSettings))
{
var xmlPathNav = new XPathDocument(xmlReader).CreateNavigator();
foreach (var selectee in xmlPathNav.Select("/*").OfType<XPathNavigator>())
yield return XDocument.Load(selectee.ReadSubtree());
}
}
I ran into a similar problem in PowerShell, but the asker's question was in C#, so I've attempted to translate it (and verified that it works). Here is where I found the clue that got me over the last little bumps (". . .The way the XPathDocument does its magic is by creating a “transparent” root node, and holding the fragments from it. I say it’s transparent because your XPath queries can use the root node axis and still get properly resolved to the fragments. . .")
The fragments of XML I'm working with happen to be smallish. If you had bigger chunks, you'd probably want to look into XStreamingElement - it can add a lot of complexity but also greatly decrease memory usage when dealing with large volumes of XML.

Related

Deserialize on the fly, or LINQ to XML

Working with C# Visual Studio 2008, MVC1.
I'm creating an xml file by fetching one from a WebService and adding some nodes to it. Now I wanted to deserialize it to a class which is the model used to strongtyped the View.
First of all, I'm facing problems to achieve that without storing the xml in the filesystem cause I don't know how this serialize and deserialize work. I guess there's a way and it's a matter of time.
But, searching for the previous in the web I came accross LINQ to XML and now I doubt whether is better to use it.
The xml would be formed by some clients details, and basically I will use all of them.
Any hint?
Thanks!!
You can save a XElement to and from a MemoryStream (no need to save it to a file stream)
MemoryStream ms = new MemoryStream();
XmlWriter xw = XmlWriter.Create(ms);
document.Save(xw);
xw.Flush();
Then if you reset the position back to 0 you can deserialize it using the DataContractSerializer.
ms.Position = 0;
DataContractSerializer serializer = new DataContractSerializer(typeof(Model));
Model model = (model) serializer.ReadObject(ms);
There are other options for how serialization works, so if this is not what you have, let me know what you are using and I will help.
try this:
XmlSerializer xmls = new XmlSerializer(typeof(XElement));
FileStream FStream;
try
{
FStream = new FileStream(doctorsPath, FileMode.Open);
_Doctors = (XElement)xmls.Deserialize(FStream); FStream.Close();
FStream = new FileStream(patientsPath, FileMode.Open);
_Patients = (XElement)xmls.Deserialize(FStream)
FStream.Close();
FStream = new FileStream(treatmentsPath, FileMode.Open);
_Treatments = (XElement)xmls.Deserialize(FStream);
FStream.Close();
}
catch
{ }
This will load all of the XML files into our XElement variables. The try – catch block is a form of exception handling that ensures that if one of the functions in the try block throws an exception, the program will jump to the catch section where nothing will happen. When working with files, especially reading files, it is a good idea to work with try – catch.
LINQ to XML is an excellent feature. You can always rely on that. You don't need to write or read or data from file. You can specify either string or stream to the XDocument
There are enough ways to load an XML element to the XDocument object. See the appropriate Load functions. Once you load the content, you can easily add/remove the elements and later you can save to disk if you want.

Splitting a large XML file in two using C# console app

I need to split am XML file (~400 MB) in two, so that a legacy app can process the file. At the moment its throwing an exception when the file is over around 300 MB.
As I can't change the app which is doing the processing, I thought I could write a console app to split the file in two first. What's the best way of doing this? It needs to be automated so I can't use a text editor, and I'm using C#.
I suppose the considerations are:
writing a header to the new files after the split
finding a good place to split (not in middle of 'object')
closing off tags and file correctly in first file, opening tags correctly in second file
Any suggestions?
The "best" way is likely to be based on XmlReader and XmlWriter. Using these "streaming" APIs avoids needing to load the whole XML object model in memory (and with DOM –XmlDocument– that can need considerably more memory than the text data).
Using these APIs is harder than just loading the document: your implementation needs to track the context (eg. current node and ancestor list), but in this case that wouldn't be complex (just enough to open the elements to the current state when opening each output document).
You might want to consider making a full copy of the file and then deleting elements from each. You will have to decide at what level the deletions could occur.
It should then be fairly straightforward, from a count of how many elements have been deleted from FileA, to identify how many (and from what starting point) should be deleted from FileB.
Is that feasible for your circumstance?
I have put together the following to describe my thinking. It is not tested, but I would value the comments of the group. Downvote me if you want but I would prefer constructive criticism.
using System.Xml;
using System.Xml.Schema;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
SplitXML(args[0], args[1]);
}
private static void SplitXML(string fileNameA, string fileNameB)
{
int deleteCount;
XmlNodeList childNodes;
XmlReader reader;
XmlTextWriter writer;
XmlDocument doc;
// ------------- Process FileA
reader = XmlReader.Create(fileNameA);
doc = new XmlDocument();
doc.Load(reader);
childNodes = doc.DocumentElement.ChildNodes;
deleteCount = childNodes.Count / 2;
for (int i = 0; i < deleteCount; i++)
{
doc.DocumentElement.RemoveChild(childNodes.Item(0));
}
writer = new XmlTextWriter("FileC", null);
doc.Save(writer);
// ------------- Process FileB
reader = XmlReader.Create(fileNameB);
doc = new XmlDocument();
doc.Load(reader);
childNodes = doc.DocumentElement.ChildNodes;
for (int i = deleteCount + 1; i < childNodes.Count; i++)
{
doc.DocumentElement.RemoveChild(childNodes.Item(deleteCount +1));
}
writer = new XmlTextWriter("FileD", null);
doc.Save(writer);
}
}
}
If it's pure C#, running it as a 64-bit process might solve the problem for no effort at all (assuming you have a 64-bit Windows at hand).

Best approach to write huge xml data to file?

I'm currently exporting a database table with huge data (100000+ records) into an xml file using XmlTextWriter class and I'm writing directly to a file on the physical drive.
_XmlTextWriterObject = new XmlTextWriter(_xmlFilePath, null);
While my code runs ok, my question is that is it the best approach? Should I instead write the whole xml in memory stream first and then write the xml document in physical file from memory stream? And what are the effects on memory/ performance in both cases?
EDIT
Sorry that I could not actually convey what I meant to say.Thanks Ash for pointing out.
I will indeed be using XmlTextWriter but I meant to say whether to pass a physical file path string to the XmlTextWriter constructor (or, as John suggested, to the XmlTextWriter.Create() method) or use stream based api. My current code looks like the following:
XmlWriter objXmlWriter = XmlTextWriter.Create(new BufferedStream(new FileStream(#"C:\test.xml", FileMode.Create, System.Security.AccessControl.FileSystemRights.Write, FileShare.None, 1024, FileOptions.SequentialScan)), new XmlWriterSettings { Encoding = Encoding.Unicode, Indent = true, CloseOutput = true });
using (objXmlWriter)
{
//writing xml contents here
}
The rule of thumb is to use XmlWriter when the document need only be written and not worked with in memory, and to use XmlDocument (or the DOM) where you do need to work with it in memory.
Remember though, XmlWriter implements IDisposable, so do the following:
using (XmlWriter _XmlTextWriterObject = XmlWriter.Create(_xmlFilePath))
{
// Code to do the write here
}
While my code runs ok, my question is that is it the best approach?
As mentioned and your update, XmlWriter.Create is fine.
Or should I write the whole xml in memory stream first and then write the xml document in physical file from memory stream?
Do you have the memory to write the entire file in-memory? If you do then that approach will be faster, otherwise stream it using a FileStream which will take care of it for you.
And what are the effects on memory/ performance in both cases?
Reading the entire XML file in will use more memory, and spike the processor to start with. Streaming to disk will use more processor. But you'll need to be using a huge file for this to be noticeable given even desktop hardware now. If you're worried about the size increasing even more in the future, stick to the FileStream technique to future proof it.
As John Saunders mentioned it is better to use XmlWriter.Create(). That is the recommendation from the MSDN. The XmlWriter.Create() method can also take an XmlWriterSettings object. There you can customize your behavior quite a bit. If you don't need validation and character checking then you can turn it off and get a bit more speed. For example
XmlWriterSettings settings = new XmlWriterSettings();
settings.CheckCharacters = false;
using (XmlWriter writer = XmlWriter.Create("path", settings))
{
//writing code
writer.Flush();
}
Otherwise I think everything is okay.

xml and files on disc interaction

Its been a while since i've needed to do this so i was looking at the old school methods of writing XMLDocument from code down to a File.
In my application i am writing alot to an XMLdocument with new elements and values and periodically saving it down to disc and also reading from the file and depending on the data i am doing things.
I am using methods like File.Exists(...) _xmldoc.LoadFile(..) etc...
Im wondering probably now a days there are better methods for this with regards
Parsing the XML
Checking its format for saving down
rather then the data being saved down being treated as text but as XML
maybe what i am doing is fine but its been a while and wondered if there are other methods :)
thanks
Well there's LINQ to XML which is a really nice XML API introduced in .NET 3.5. I don't think the existing XMLDocument API has changed much, beyond some nicer ways of creating XmlReaders and XmlWriters (XmlReader.Create/XmlWriter.Create).
I'm not sure what you really mean by your second and third bullets though. What exactly are you doing in code which feels awkward?
Have you looked at the Save method of your XmlDocument? It will save whatever is in your XmlDocument as a valid formatted file.
If your program is able to use the XmlDocument class, the XmlDocument class will be able to save your file. You won't need to worry about validating before saving, and you can give it whatever file extension you want. As to your third point... an XML file is really just a text file. It won't matter how the OS sees it.
I was a big fan of XmlDocument due to its facility to use but recently I got a huge memory problem with that class so I started to use XmlReader and XmlWriter.
XmlReader can be a little bit tricky to use if your Xml file is complex because you read the Xml file sequentially. In that case, the method ReadSubTree of XmlReader can be very useful because this method returns only the xml tree under the current node so you send the new xmlreader to a function to parse the subnode content and once it is done, you continue to the next node.
XmlReader Example:
string xmlcontent = "<BigXml/>";
using(StringReader strContent = new StringReader(xmlcontent))
{
using (XmlReader reader = XmlReader.Create(strContent))
{
while (reader.Read())
{
if (reader.Name == "SomeName" && reader.NodeType == XmlNodeType.Element)
{
//Send the XmlReader created by ReadSubTree to a function to read it.
ReadSubContentOfSomeName(reader.ReadSubtree());
}
}
}
}
XmlWriter Example:
StringBuilder builder = new StringBuilder();
using (XmlWriter writer = XmlWriter.Create(builder))
{
writer.WriteStartDocument();
writer.WriteStartElement("BigXml");
writer.WriteAttributeString("someAttribute", "42");
writer.WriteString("Some Inner Text");
//Write nodes under BigXml
writer.WriteStartElement("SomeName");
writer.WriteEndElement();
writer.WriteEndElement();
writer.WriteEndDocument();
}

Fastest way to add new node to end of an xml?

I have a large xml file (approx. 10 MB) in following simple structure:
<Errors>
<Error>.......</Error>
<Error>.......</Error>
<Error>.......</Error>
<Error>.......</Error>
<Error>.......</Error>
</Errors>
My need is to write add a new node <Error> at the end before the </Errors> tag. Whats is the fastest way to achieve this in .net?
You need to use the XML inclusion technique.
Your error.xml (doesn't change, just a stub. Used by XML parsers to read):
<?xml version="1.0"?>
<!DOCTYPE logfile [
<!ENTITY logrows
SYSTEM "errorrows.txt">
]>
<Errors>
&logrows;
</Errors>
Your errorrows.txt file (changes, the xml parser doesn't understand it):
<Error>....</Error>
<Error>....</Error>
<Error>....</Error>
Then, to add an entry to errorrows.txt:
using (StreamWriter sw = File.AppendText("logerrors.txt"))
{
XmlTextWriter xtw = new XmlTextWriter(sw);
xtw.WriteStartElement("Error");
// ... write error messge here
xtw.Close();
}
Or you can even use .NET 3.5 XElement, and append the text to the StreamWriter:
using (StreamWriter sw = File.AppendText("logerrors.txt"))
{
XElement element = new XElement("Error");
// ... write error messge here
sw.WriteLine(element.ToString());
}
See also Microsoft's article Efficient Techniques for Modifying Large XML Files
First, I would disqualify System.Xml.XmlDocument because it is a DOM which requires parsing and building the entire tree in memory before it can be appended to. This means your 10 MB of text will be more than 10 MB in memory. This means it is "memory intensive" and "time consuming".
Second, I would disqualify System.Xml.XmlReader because it requires parsing the entire file first before you can get to the point of when you can append to it. You would have to copy the XmlReader into an XmlWriter since you can't modify it. This requires duplicating your XML in memory first before you can append to it.
The faster solution to XmlDocument and XmlReader would be string manipulation (which has its own memory issues):
string xml = #"<Errors><error />...<error /></Errors>";
int idx = xml.LastIndexOf("</Errors>");
xml = xml.Substring(0, idx) + "<error>new error</error></Errors>";
Chop off the end tag, add in the new error, and add the end tag back.
I suppose you could go crazy with this and truncate your file by 9 characters and append to it. Wouldn't have to read in the file and would let the OS optimize page loading (only would have to load in the last block or something).
System.IO.FileStream fs = System.IO.File.Open("log.xml", System.IO.FileMode.Open, System.IO.FileAccess.ReadWrite);
fs.Seek(-("</Errors>".Length), System.IO.SeekOrigin.End);
fs.Write("<error>new error</error></Errors>");
fs.Close();
That will hit a problem if your file is empty or contains only "<Errors></Errors>", both of which can easily be handled by checking the length.
The fastest way would probably be a direct file access.
using (StreamWriter file = File.AppendText("my.log"))
{
file.BaseStream.Seek(-"</Errors>".Length, SeekOrigin.End);
file.Write(" <Error>New error message.</Error></Errors>");
}
But you lose all the nice XML features and may easily corrupt the file.
I would use XmlDocument or XDocument to Load your file and then manipulate it accordingly.
I would then look at the possibility of caching this XmlDocument in memory so that you can access the file quickly.
What do you need the speed for? Do you have a performance bottleneck already or are you expecting one?
How is your XML-File represented in code? Do you use the System.XML-classes? In this case you could use XMLDocument.AppendChild.
Try this out:
var doc = new XmlDocument();
doc.LoadXml("<Errors><error>This is my first error</error></Errors>");
XmlNode root = doc.DocumentElement;
//Create a new node.
XmlElement elem = doc.CreateElement("error");
elem.InnerText = "This is my error";
//Add the node to the document.
if (root != null) root.AppendChild(elem);
doc.Save(Console.Out);
Console.ReadLine();
Here's how to do it in C, .NET should be similar.
The game is to simple jump to the end of the file, skip back over the tag, append the new error line, and write a new tag.
#include <stdio.h>
#include <string.h>
#include <errno.h>
int main(int argc, char** argv) {
FILE *f;
// Open the file
f = fopen("log.xml", "r+");
// Small buffer to determine length of \n (1 on Unix, 2 on PC)
// You could always simply hard code this if you don't plan on
// porting to Unix.
char nlbuf[10];
sprintf(nlbuf, "\n");
// How long is our end tag?
long offset = strlen("</Errors>");
// Add in an \n char.
offset += strlen(nlbuf);
// Seek to the END OF FILE, and then GO BACK the end tag and newline
// so we use a NEGATIVE offset.
fseek(f, offset * -1, SEEK_END);
// Print out your new error line
fprintf(f, "<Error>New error line</Error>\n");
// Print out new ending tag.
fprintf(f, "</Errors>\n");
// Close and you're done
fclose(f);
}
The quickest method is likely to be reading in the file using an XmlReader, and simply replicating each read node to a new stream using XmlWriter When you get to the point at which you encounter the closing </Errors> tag, then you just need to output your additional <Error> element before coninuing the 'read and duplicate' cycle. This way is inevitably going to be harder than than reading the entire document into the DOM (XmlDocument class), but for large XML files, much quicker. Admittedly, using StreamReader/StreamWriter would be somewhat faster still, but pretty horrible to work with in code.
Using string-based techniques (like seeking to the end of the file and then moving backwards the length of the closing tag) is vulnerable to unexpected but perfectly legal variations in document structure.
The document could end with any amount of whitespace, to pick the likeliest problem you'll encounter. It could also end with any number of comments or processing instructions. And what happens if the top-level element isn't named Error?
And here's a situation that using string manipulation fails utterly to detect:
<Error xmlns="not_your_namespace">
...
</Error>
If you use an XmlReader to process the XML, while it may not be as fast as seeking to EOF, it will also allow you to handle all of these possible exception conditions.
I attempted to use code other answers had suggested but ran into an issue where sometimes calling .length on my strings was not the same as the number of bytes for the string so I was inconsistently losing characters. I modified it to get the byte count instead.
var endTag = "</Errors>";
var nodeText = GetNodeText();
using (FileStream file = File.Open("my.log", FileMode.Open, FileAccess.ReadWrite))
{
file.BaseStream.Seek(-(Encoding.UTF8.GetByteCount(endTag)), SeekOrigin.End);
fileStream.Write(Encoding.UTF8.GetBytes(nodeText), 0, Encoding.UTF8.GetByteCount(nodeText));
fileStream.Write(Encoding.UTF8.GetBytes(endTag), 0, Encoding.UTF8.GetByteCount(endTag));
}

Categories