I have written a console application to fetch some information from a web server, convert it into XML and save it. I have manually created XML (append string using StringBuilder). As the XML might be very large is it better to use StringBuilder or XMLDocument class etc as far as memory is concerned?
To be precise my question is that if XML is like 10mb text is it memory efficient to use StringBuilder.append("") or System.XML namespace?
I think a more efficient way would be to use StringBuilder but saving the XML to a file on HD after every iteration and clearing the stringbuilder object. Any comments?
Thanks in advance. :)
Neither; I'd use an XmlWriter:
using(var file = File.Create(path))
using(var writer = XmlWriter.Create(file))
{
// write to writer here
}
This avoids having to buffer a lot of data in memory, which both StringBuilder and XmlDocument would do, and avoids all the encoding problems you will face if creating the xml manually (not a good idea, to be honest).
I wouldn't manually create XML using a StringBuilder, there is just too much room for errors (if it is proper escaping of strings).
To write an XML file of larger size you should use the XmlWriter class.
If you have the XSD for the XML I'd use xsd.exe XML Schema Definition Tool which can generate c# classes from it. From the code it's quite easy to serialize and deserialize the XML so this way you don't have to work with long strings. Just build up your class and save it as a valid XML text.
Related
I've been looking through the page but I was not able to find any answer to my problem:
I'm downloading a XML file from a server through a stream and processing it using XmlReader:
XmlReader xml = XmlReader.Create(XMLstream, settings);
while (xml.Read())
{
...
}
So the XML is being downloaded while it's being parsed, which is what I want for efficency reasons (I don't usually need to read the full document).
The problem is that sometimes, I'm facing a problem that already appeared in this website, which is that the XML contains non standard characters, which leads to exceptions.
I know how to filter this characters with XmlConvert.IsXmlChar(), and a method to perform this clean up would be: download full XML code -> filter -> pass filtered string to XmlReader.
The problem with this method is that I have to download the full XML file, and sometimes this could be up to 10MB in slow connections!
Is there any method (callback or something) to filter the information just before the chunk of XML code is parsed?
As far as I know the method XmlReader.Read() manages the stream and the parse the result, but I need something in the middle.
Any idea?
Thank you very much in advance.
This code:
XmlNode columnNode = null;
columnNode = xmlDoc.CreateElement("SYSID");
columnNode.InnerText = ""; // Empty string
newRowNode.AppendChild(columnNode);
...does this:
<SYSID>
</SYSID>
And I would like to have this, when string is empty:
<SYSID></SYSID>
Is there any solution?
If you have another tool that requires that format, then the other tool is wrong - it is incapable of reading XML. So if you have control over the other tool, I'd suggest fixing it rather than trying to coerce your code into matching it.
If you can't fix the other tool...
If you're just building a Document to write it out to disk, then you can use a stream and write the elements directly yourself (as simple text). This will be faster (and may well be easier) than using an XmlDoc.
As an improvement on that, you may be able to use an XmlWriter to write elements, but when you go to write an empty element, write raw text to the stream (i.e. writer.WriteRaw("<SYSID></SYSID>\n")) so that you control the formatting for those particular elements.
If you need to build an in-memory XmlDocument, then to a large extent you have to put up with the formatting that it uses when you ask it to serialize to disk (aside from basic settings like PreserveWhitespace, you're asking the document to deal with storing the information, and so you lose a lot of control over the functionality that the XmlDocument encapsulates). THe best suggestion I can think of in this case would be to write the XmlDocument to a MemoryStream and then post-process that memory stream to remove newlines from within empty elements. (Yuck!)
I read XML files that sometimes contain elements like
<stringValue>text
text</stringValue>
XmlReader returns
text\ntext
for such strings.
So, when I rewrite the source XML later using XmlWriter I don't get the same strings (there is no
in them).
Should I worry about all this or it's fine to allow string to be changed this way?
I would worry about it yes because your manipulating the data. This means if you do a round-trip to the XML document the text formatting wouldn't be the same.
You would need to make sure on saving back out to XML persist the same formatting.
is the xml encoding for a new line character (\n). If your XML data has a new line in the text, then this notation is correct and the output from XMLWriter is correct. If the new line was not in the original XML data, I've been seeing an issue with IE10/IE11 using the XMLHttpRequest object inserting \r\n in the XML data.
How do I use Linq to extract a single XML attribute form each XML file in a directory and put that element in a C# list. Do I have to loop thru each file one-by-one? The XML files are quite large so I'd like to do this without loading the entire file into memory.
Thanks,
j
Unless the files are massive (100 MB+) I would be unable to turn down the elegance of this code:
var result = Directory.GetFiles(filePath)
.Select(path => XDocument.Load(path))
.Select(doc => doc.Root.Element("A").Attribute("B").Value)
.ToList();
I really hope your XML files are not that big though...
You do have to go through every file, and this will mean at least parsing enough of the XML content of each file to get to the required attribute.
XDocument (i.e. LINQ to SQL) will parse and load the complete document in each case, so you might be better using an XmlReader instance directly. This will require more work: you will have to read the XML nodes until you get to the right one, keeping track of where you are.
I am creating an XML file in C# using a XSD Schema of an InfoPath form.
When I save the IP form without using the code, I get an XML file with the following header:
<?xml version="1.0" encoding="UTF-8"?>
<?mso-infoPathSolution solutionVersion="1.0.0.113" productVersion="14.0.0" PIVersion="1.0.0.0" href="file:///\\Hmfp\mcs-shared\PMU\PMU-shared\Tests\QF%207.5%20PMU%20Project%20Outline%20Form%20F1.0.xsn" name="urn:schemas-microsoft-com:office:infopath:QF-7-5-PMU-Project-Outline-Form-F1-0:-myXSD-2010-07-22T07-48-32" ?>
<?mso-application progid="InfoPath.Document"?>
<my:myFields...
And this file is recognized by InfoPath and uses the correct XSD, thus displaying the XML data in the correct form.
But when I use the code, I get this:
<?xml version="1.0"?>
<myFields...
And this is not recognized nor opened directly by InfoPath; so I would like to insert the two tags in order to keep that functionality, so that the users do not see the difference.
My line of thought is to modify the XML file after it has already been created, saved and closed.
It would be very nice if you could help :D. Thanks in advance..
EDIT: I've finally been able to achieve what I wanted. I made use of both MainMa's and dahlbyk's answers and came up with something that works:
I let the file get saved like before
I created an XmlReader object from the file
I loaded the XmlReader into an XmlDocument object
I created an XmlProcessingInstruction object using XmlDocument.CreateProcessingInstruction
I inserted that PI in the XmlDoc using xmlDoc.InsertAfter(thePI, XmlDoc.FirstChild)
I then created a second PI object
Which I inserted using xmlDoc.InsertAfter(thePI, XmlDoc.FirstChild.NextSibling)
Then I saved the XmlDoc in the file, overwriting it
Anyway, your answers helped me understand many things, which made me find the answer, so thank you very much!!
I would try making an XmlWriter for your FileStream, use WriteProcessingInstruction() to add your headers, then pass the writer into the appropriate overload of Serialize() to capture the rest of the output.
The first three lines of your first code sample are called XML processing instructions (PI). So if you are creating your output XML with XmlDocument, you can use XmlDocument.CreateProcessingInstruction method to add the required PI.
If you are serializing into XML, you can also use XmlTextWriter.WriteProcessingInstruction just before serializing the object.
If for some reasons, you cannot do that, you can also save the file, open it and insert two PI after the first line break, but I highly discourage you to do that, since it will make your code difficult to maintain in future, and slow things down.