Best approach to write huge xml data to file?

Best approach to write huge xml data to file? - c#

I'm currently exporting a database table with huge data (100000+ records) into an xml file using XmlTextWriter class and I'm writing directly to a file on the physical drive.
_XmlTextWriterObject = new XmlTextWriter(_xmlFilePath, null);
While my code runs ok, my question is that is it the best approach? Should I instead write the whole xml in memory stream first and then write the xml document in physical file from memory stream? And what are the effects on memory/ performance in both cases?
EDIT
Sorry that I could not actually convey what I meant to say.Thanks Ash for pointing out.
I will indeed be using XmlTextWriter but I meant to say whether to pass a physical file path string to the XmlTextWriter constructor (or, as John suggested, to the XmlTextWriter.Create() method) or use stream based api. My current code looks like the following:
XmlWriter objXmlWriter = XmlTextWriter.Create(new BufferedStream(new FileStream(#"C:\test.xml", FileMode.Create, System.Security.AccessControl.FileSystemRights.Write, FileShare.None, 1024, FileOptions.SequentialScan)), new XmlWriterSettings { Encoding = Encoding.Unicode, Indent = true, CloseOutput = true });
using (objXmlWriter)
{
//writing xml contents here
}

The rule of thumb is to use XmlWriter when the document need only be written and not worked with in memory, and to use XmlDocument (or the DOM) where you do need to work with it in memory.
Remember though, XmlWriter implements IDisposable, so do the following:
using (XmlWriter _XmlTextWriterObject = XmlWriter.Create(_xmlFilePath))
{
// Code to do the write here
}

While my code runs ok, my question is that is it the best approach?
As mentioned and your update, XmlWriter.Create is fine.
Or should I write the whole xml in memory stream first and then write the xml document in physical file from memory stream?
Do you have the memory to write the entire file in-memory? If you do then that approach will be faster, otherwise stream it using a FileStream which will take care of it for you.
And what are the effects on memory/ performance in both cases?
Reading the entire XML file in will use more memory, and spike the processor to start with. Streaming to disk will use more processor. But you'll need to be using a huge file for this to be noticeable given even desktop hardware now. If you're worried about the size increasing even more in the future, stick to the FileStream technique to future proof it.

As John Saunders mentioned it is better to use XmlWriter.Create(). That is the recommendation from the MSDN. The XmlWriter.Create() method can also take an XmlWriterSettings object. There you can customize your behavior quite a bit. If you don't need validation and character checking then you can turn it off and get a bit more speed. For example
XmlWriterSettings settings = new XmlWriterSettings();
settings.CheckCharacters = false;
using (XmlWriter writer = XmlWriter.Create("path", settings))
{
//writing code
writer.Flush();
}
Otherwise I think everything is okay.

Related

Deserialize on the fly, or LINQ to XML

Working with C# Visual Studio 2008, MVC1.
I'm creating an xml file by fetching one from a WebService and adding some nodes to it. Now I wanted to deserialize it to a class which is the model used to strongtyped the View.
First of all, I'm facing problems to achieve that without storing the xml in the filesystem cause I don't know how this serialize and deserialize work. I guess there's a way and it's a matter of time.
But, searching for the previous in the web I came accross LINQ to XML and now I doubt whether is better to use it.
The xml would be formed by some clients details, and basically I will use all of them.
Any hint?
Thanks!!

You can save a XElement to and from a MemoryStream (no need to save it to a file stream)
MemoryStream ms = new MemoryStream();
XmlWriter xw = XmlWriter.Create(ms);
document.Save(xw);
xw.Flush();
Then if you reset the position back to 0 you can deserialize it using the DataContractSerializer.
ms.Position = 0;
DataContractSerializer serializer = new DataContractSerializer(typeof(Model));
Model model = (model) serializer.ReadObject(ms);
There are other options for how serialization works, so if this is not what you have, let me know what you are using and I will help.

try this:
XmlSerializer xmls = new XmlSerializer(typeof(XElement));
FileStream FStream;
try
{
FStream = new FileStream(doctorsPath, FileMode.Open);
_Doctors = (XElement)xmls.Deserialize(FStream); FStream.Close();
FStream = new FileStream(patientsPath, FileMode.Open);
_Patients = (XElement)xmls.Deserialize(FStream)
FStream.Close();
FStream = new FileStream(treatmentsPath, FileMode.Open);
_Treatments = (XElement)xmls.Deserialize(FStream);
FStream.Close();
}
catch
{ }
This will load all of the XML files into our XElement variables. The try – catch block is a form of exception handling that ensures that if one of the functions in the try block throws an exception, the program will jump to the catch section where nothing will happen. When working with files, especially reading files, it is a good idea to work with try – catch.

LINQ to XML is an excellent feature. You can always rely on that. You don't need to write or read or data from file. You can specify either string or stream to the XDocument
There are enough ways to load an XML element to the XDocument object. See the appropriate Load functions. Once you load the content, you can easily add/remove the elements and later you can save to disk if you want.

Stream StringBuilder to file

I need to create a large text document. I currently use StringBuilder to make the document and then call File.WriteallText(filename,sb.ToString). Unfortunately, this is now starting to throw out of memory exceptions.
Is there a better way to stream a StringBuilder to file or is there some other technique I should be using?

Instead of using StringBuilder, try using TextWriter (which has a broadly similar API, but which can write to a number of underlying destinations, including files) - i.e.
using(TextWriter writer = File.CreateText(path))
{
// loop etc
writer.Write(...);
}
More generally, it is worth separating the code that knows about files from the code that knows about how to write the data, i.e.
using(var writer = File.CreateText(path))
{
Serialize(writer);
}
...
void Serialize(TextWriter writer)
{
...
}
this makes it easier to write to different targets. For example, you can now do in-memory too:
var sw = new StringWriter();
Serialize(sw);
string text = sw.ToString();
The point being: your Serialize code didn't need to change to accomodate a different target. This could also be writing directly to a network, or writing through a compression/encryption stream. Very versatile.

Just use a StreamWriter that writes to a FileStream:
using (StreamWriter writer = new StreamWriter("filename.txt")) {
...
}
This will of course mean that you can't change the text that is already written, like you can do in a StringBuilder, but I assume that you are not using that anyway.

Why not streaming directly into the stream?
You could use the TextWriter.

You can use StreamWriter and write to the file directly.

Writing to XML resource file through serialization

So I'm writing a C# app that has a couple of XML files added as resources. I read from these XML files to populate objects in the code using XML serialization and it works perfectly. So I can access the files like so (I've left some code out, just have the important bits):
using TestApp.Properties;
XmlSerializer serializer = new XmlSerializer(typeof(MapTiles));
StringReader sr = new StringReader(Resources.Maps);
mapTiles = (MapTiles)serializer.Deserialize(sr);
Now however, I'd like to do the opposite. I'd like to take some data and write it to these XML resource files. However, I seem to be running into trouble with this aspect and was hoping someone could see something I'm messing up or let me know what I need to do? Here's what I'm trying to do:
XmlSerializer serializer = new XmlSerializer(typeof(MapTiles));
TextWriter writer = new StreamWriter(Resources.Maps);
serializer.Serialize(writer, tempGroup);
writer.Close();
When I run this code though, I get an error on the 2nd line that says ArgumentException was unhandled - Empty path name is not legal.
So if anyone has any thoughts I would greatly appreciate some tips. Thanks so much.

The exception is being raised because you are passing Resources.Maps into StreamWriter. StreamWriter is handling this as a string and assuming it is file path for the stream. But the file path is empty so it is throwing an exception.
To fix out the StreamWriter line you could specify a local temporary file instead of Resources.Maps or use StringWriter with the default constructor e.g. new StringWriter().
If you are writing .resx files then ResXResourceWriter is the class you need. It will also handle your stream writing too. See http://msdn.microsoft.com/en-us/library/system.resources.resxresourcewriter.aspx and http://msdn.microsoft.com/en-us/library/ekyft91f.aspx. The second page has examples on how to use the class but breifly you would call something like this:
XmlSerializer serializer = new XmlSerializer(typeof(MapTiles));
using (StringWriter stringWriter = new StringWriter())
{
serializer.Serialize(stringWriter, tempGroup);
using (ResXResourceWriter resourceWriter = new ResXResourceWriter("~/App_GlobalResources/some_file.resx"))
{
resourceWriter.AddResource("Maps", stringWriter.ToString());
}
}
If you want to write out an assembly that has a dynamically created resource in it the you can emit a new assembly. In that case have a look here http://msdn.microsoft.com/en-us/library/8ye65dh0.aspx.

The StreamWriter constructor is throwing the exception because Resources.Maps is an empty string.
Check out the documentation on MSDN:
http://msdn.microsoft.com/en-us/library/fysy0a4b.aspx

Hmm, after trying some other things and doing even more research it appears that the problem is that you cannot write to a resource that is part of the application. Something to do with the resource being written into the assembly and therefore it's not writable. So I'll have to modify my approach and write the output to a different file.
Unless of course anyone does know differently. This is just what I've discovered so far. Thanks to those who had some ideas though, much appreciated.

xml and files on disc interaction

Its been a while since i've needed to do this so i was looking at the old school methods of writing XMLDocument from code down to a File.
In my application i am writing alot to an XMLdocument with new elements and values and periodically saving it down to disc and also reading from the file and depending on the data i am doing things.
I am using methods like File.Exists(...) _xmldoc.LoadFile(..) etc...
Im wondering probably now a days there are better methods for this with regards
Parsing the XML
Checking its format for saving down
rather then the data being saved down being treated as text but as XML
maybe what i am doing is fine but its been a while and wondered if there are other methods :)
thanks

Well there's LINQ to XML which is a really nice XML API introduced in .NET 3.5. I don't think the existing XMLDocument API has changed much, beyond some nicer ways of creating XmlReaders and XmlWriters (XmlReader.Create/XmlWriter.Create).
I'm not sure what you really mean by your second and third bullets though. What exactly are you doing in code which feels awkward?

Have you looked at the Save method of your XmlDocument? It will save whatever is in your XmlDocument as a valid formatted file.
If your program is able to use the XmlDocument class, the XmlDocument class will be able to save your file. You won't need to worry about validating before saving, and you can give it whatever file extension you want. As to your third point... an XML file is really just a text file. It won't matter how the OS sees it.

I was a big fan of XmlDocument due to its facility to use but recently I got a huge memory problem with that class so I started to use XmlReader and XmlWriter.
XmlReader can be a little bit tricky to use if your Xml file is complex because you read the Xml file sequentially. In that case, the method ReadSubTree of XmlReader can be very useful because this method returns only the xml tree under the current node so you send the new xmlreader to a function to parse the subnode content and once it is done, you continue to the next node.
XmlReader Example:
string xmlcontent = "<BigXml/>";
using(StringReader strContent = new StringReader(xmlcontent))
{
using (XmlReader reader = XmlReader.Create(strContent))
{
while (reader.Read())
{
if (reader.Name == "SomeName" && reader.NodeType == XmlNodeType.Element)
{
//Send the XmlReader created by ReadSubTree to a function to read it.
ReadSubContentOfSomeName(reader.ReadSubtree());
}
}
}
}
XmlWriter Example:
StringBuilder builder = new StringBuilder();
using (XmlWriter writer = XmlWriter.Create(builder))
{
writer.WriteStartDocument();
writer.WriteStartElement("BigXml");
writer.WriteAttributeString("someAttribute", "42");
writer.WriteString("Some Inner Text");
//Write nodes under BigXml
writer.WriteStartElement("SomeName");
writer.WriteEndElement();
writer.WriteEndElement();
writer.WriteEndDocument();
}

How to inspect XML streams from the debugger in Visual Studio 2003

I've got to edit an XSLT stylesheet, but I'm flying blind because the XML input only exists fleetingly in a bunch of streams. I can debug into the code, but can't figure out how to get the contents of the streams out into text I can look at (and run through the XSLT manually while I'm editing them).
The code is part of a big old legacy system, I can modify it in a debug environment if absolutely necessary, but it runs in a windows service connected up to a bunch of MSMQs. So for various reasons I'd rather be able to use the debugger to see the XML without having to change the code first.
Code much simplified, is something like this: (C# - but remember it's .net 1.1 in VS 2003.)
This is the function that gets the XML as a stream, which is then fed into some sort of XSLT transform object. I've tried looking at the writer and xmlStream objects in the watch windows and the immediate window, but can't quite fathom how to see the actual XML.
private MemoryStream GetXml()
{
MemoryStream xmlStream;
xmlStream = new MemoryStream();
XmlWriter writer = new XmlTextWriter(xmlStream, Encoding.UTF8);
writer.WriteStartDocument();
//etc etc...
writer.WriteEndDocument();
writer.Flush();
xmlStream.Position = 0;
return xmlStream; //Goes off to XSLT transform thingy!
}
All help much appreciated.

You could simply add this expression to your watch window after the MemoryStream is ready:
(new StreamReader(xmlStream)).ReadToEnd();
Watch expressions don't need to be simple variable values. They can be complex expressions, but they will have side-effects. As you've noted, this will interrupt execution, since the stream contents will be read out completely. You could recreate the stream after the interruption with another expression, if you need to re-start execution.
This situation arises frequently when debuging code with streams, so I avoid them for simple, self-contained tasks. Unfortunately, for large systems, it's not always easy to know in advance whether you should make your code stream-oriented or not, since it depends greatly on how it will be used. I consider the use of streams to be a premature optimization in many cases, however.

OK, I've not succeeded in using the debugger without modifying the code. I added in the following snippet, which lets me either put a breakpoint in or use debugview.
private MemoryStream GetXml()
{
MemoryStream xmlStream;
xmlStream = new MemoryStream();
XmlWriter writer = new XmlTextWriter(xmlStream, Encoding.UTF8);
writer.WriteStartDocument();
//etc etc...
writer.WriteEndDocument();
writer.Flush();
xmlStream.Position = 0;
#if DEBUG
string temp;
StreamReader st=new StreamReader(xmlStream);
temp=st.ReadToEnd();
Debug.WriteLine(temp);
#endif
return xmlStream; //Goes off to XSLT transform thingy!
}
I'd still prefer to simply look at the xmlstream object in the debugger somehow, even if it disrupts the flow of execution, but in the meantime this is the best I've managed.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Best approach to write huge xml data to file? - c#

Related

Deserialize on the fly, or LINQ to XML

Stream StringBuilder to file

Writing to XML resource file through serialization

xml and files on disc interaction

How to inspect XML streams from the debugger in Visual Studio 2003

Categories

Resources