XmlDocument not preserving whitespace - c#

XmlDocument is adding a space at the end of self closing tags, even with PreserveWhitespace set to true.
// This fails
string originalXml = "<sample><node id=\"99\"/></sample>";
// Convert to XML
XmlDocument doc = new XmlDocument();
doc.PreserveWhitespace = true;
doc.LoadXml(originalXml);
// Save back to a string
string extractedXml = null;
using (MemoryStream stream = new MemoryStream())
{
doc.Save(stream);
stream.Position = 0;
using(StreamReader reader = new StreamReader(stream))
{
extractedXml = reader.ReadToEnd();
}
}
// Confirm that they are identical
Assert.AreEqual(originalXml, extractedXml);
The desired output is:
<sample><node id="99"/></sample>
But I am getting:
<sample><node id="99" /></sample>
Is there a way to suppress that extra space?

Here's how XmlDocument.Save(Stream) looks like :
public virtual void Save(Stream outStream)
{
XmlDOMTextWriter xmlDomTextWriter = new XmlDOMTextWriter(outStream, this.TextEncoding);
if (!this.preserveWhitespace)
xmlDomTextWriter.Formatting = Formatting.Indented;
this.WriteTo((XmlWriter) xmlDomTextWriter);
xmlDomTextWriter.Flush();
}
So setting PreserveWhiteSpace has no effect on the inside of the nodes. The documentation of the XmlTextWriter says :
When writing an empty element, an additional space is added between tag name and the closing tag, for example . This provides compatibility with older browsers.
So I guess there is no easy way out. Here's a workaround tho:
So I wrote a wrapper class MtxXmlWriter that is derived from XmlWriter and wraps the original XmlWriter returned by XmlWriter.Create() and does all the necessary tricks.
Instead of using XmlWriter.Create() you just call one of the MtxXmlWriter.Create() methods, that's all. All other methods are directly handed over to the encapsulated original XmlWriter except for WriteEndElement(). After calling WriteEndElement() of the encapsulated XmlWriter, " />" is replaced with "/>" in the buffer:

Related

Is it possible to manually line break on a specific attribute when using XmlWriter in Indent mode?

XmlWriter allows configuring indentation when using XmlWriter.Create and XmlWriterSettings.
In general, I want Indent = true and NewLineOnAttributes = false, except when writing xmlns namespace declarations at the beginning of the file, where I would like to have new lines between each xmlns namespace for readability.
Is it possible to force XmlWriter to do a line break after writing a specific attribute, and otherwise follow general indentation rules?
I tried using WriteWhitespace and WriteRaw with \n:
using System;
using System.Text;
using System.Xml;
namespace XmlWriterIndent
{
class Program
{
static void Main(string[] args)
{
var output = new StringBuilder();
using (var writer = XmlWriter.Create(output, new XmlWriterSettings { Indent = true, NewLineOnAttributes = true }))
{
writer.WriteStartDocument();
writer.WriteStartElement("Node");
writer.WriteAttributeString("key1", "value1");
writer.WriteAttributeString("key2", "value2");
writer.WriteAttributeString("xmlns", "n1", null, "scheme://mynamespace.com");
writer.WriteRaw("\n");
writer.WriteAttributeString("xmlns", "n2", null, "scheme://anothernamespace.com");
writer.WriteEndElement();
writer.WriteEndDocument();
}
var xml = output.ToString();
Console.WriteLine(xml);
}
}
}
Unfortunately this throws an exception saying the XML document would be invalid.
UPDATE: Actually, after checking more carefully, the exception is not in the WriteRaw method itself, but rather in the following WriteAttributeString call, as I am calling these methods in a loop for all namespaces.
It looks like WriteRaw moves the XmlWriter into the element content state somehow. Is it possible to use WriteRaw or somehow insert whitespace between attributes without changing the writer state?
UPDATE: Added self-contained example. Actually, it looks like in general namespace declarations are ignored even when using NewLineOnAttributes, i.e. all attributes have new lines except namespace declarations, which are somehow handled differently despite being regular attributes.
Unfortunately, I'm approaching the conclusion that the XmlWriter API is simply broken, as there is no way to do raw formatting of XML, since the WriteRaw forces a change in the writer state.
Looking into the actual source code at referencesource shows that the special write method WriteIndent is used to handle indentation inside XmlWriter. This method has special behavior that doesn't change the state, but there seems to be no way to access it or the underlying data stream, so it seems impossible to work around this without a full reimplementation of the entire XML writer stack:
https://referencesource.microsoft.com/#System.Xml/System/Xml/Core/XmlEncodedRawTextWriter.cs,1739
The XmlWriter API has no way to do raw formatting of XML attributes. WriteRaw would be the appropriate method to call, but the internal XmlWellFormedWriter returned by XmlWriter.Create always advances the writer state when this method is called, advancing the XML state machine to content. If we are in the middle of writing attributes, this finishes the start tag of the element and moves to content, which is not where we want to write our custom indentation.
Several internal XmlWriter classes implement more low-level WriteRaw methods, but there seems to be no way of accessing them, as XmlWriter.Create always wraps created writers with a XmlWellFormedWriter instance before returning.
Therefore, the only way to workaround the issue is to define and instantiate a custom XmlWriter class which controls both the underlying stream and the base XmlWriter. That way we can bypass the XmlWriter API and write directly into the stream when we need to do our custom indentation.
There are a couple of limitations with the current solution:
XmlWriter implementations do not write directly to the stream, and instead keep their own internal buffers for efficiency. That means that whenever we want to bypass the XmlWriter we need to call Flush to make sure that our stream is in the right position;
In order for indentation to make sense, we need to keep track of the correct indent level. To do this for the whole document would be possible, but tedious. For simplicity, this solution only formats the top level xmlns declarations;
For completeness, we need to deal with both Stream and TextWriter as possible output types.
Finally, the XmlWriter abstract class has dozens of methods to override which we don't care about, but that need to be bridged to the underlying writer. For conciseness, I have omitted all but the relevant overrides:
class XmlnsIndentedWriter : XmlWriter
{
bool isRootElement;
int indentLevel = -1;
readonly Stream stream;
readonly TextWriter textWriter;
readonly XmlWriter writer;
private XmlnsIndentedWriter(Stream output, XmlWriter baseWriter)
{
stream = output;
writer = baseWriter;
}
private XmlnsIndentedWriter(TextWriter output, XmlWriter baseWriter)
{
textWriter = output;
writer = baseWriter;
}
public static new XmlWriter Create(StringBuilder output, XmlWriterSettings settings)
{
var writer = XmlWriter.Create(output, settings);
return new XmlnsIndentedWriter(new StringWriter(output, CultureInfo.InvariantCulture), writer);
}
public static new XmlWriter Create(Stream stream, XmlWriterSettings settings)
{
var writer = XmlWriter.Create(stream, settings);
return new XmlnsIndentedWriter(stream, writer);
}
// snip: override all methods in the XmlWriter class
private void WriteRawText(string text)
{
writer.Flush();
if (stream != null)
{
// example only, this could be optimized with buffers, etc.
var buf = writer.Settings.Encoding.GetBytes(text);
stream.Write(buf, 0, buf.Length);
}
else if (textWriter != null)
{
textWriter.Write(text);
}
}
public override void WriteStartDocument()
{
isRootElement = true;
writer.WriteStartDocument();
}
public override void WriteStartElement(string prefix, string localName, string ns)
{
if (isRootElement)
{
if (indentLevel < 0)
{
// initialize the indent level;
// length of local name + any control characters / prefixes, etc.
indentLevel = localName.Length + 1;
}
else
{
// do not track indent for the whole document;
// when second element starts, we are done
isRootElement = false;
indentLevel = -1;
}
}
writer.WriteStartElement(prefix, localName, ns);
}
public override void WriteEndAttribute()
{
writer.WriteEndAttribute();
if (indentLevel >= 0)
{
RawText(Environment.NewLine + new string(' ', indentLevel));
}
}
}
There is a choice to add the indentation either before each attribute, or after.
Here I have opted for the latter, as it seems to be the only option if you want to also indent the default xmlns declaration. This declaration is written out after the writer state moves to content, and there seems to be no way of intercepting it otherwise.

How to use XDocument.Save to save a file using custom indentation for attributes

My goal is to output a modified XML file and preserve a special indentation that was present in the original file. The objective is so that the resulting file still looks like the original, making them easier to compare and merge through source control.
My program will read a XML file and add or change one specific attribute.
Here is the formatting I'm trying to achieve / preserve:
<Base Import="..\commom\style.xml">
<Item Width="480"
Height="500"
VAlign="Center"
Style="level1header">
(...)
In this case, I simply wish to align all attributes past the first one with the first one.
XmlWriterSettings provides formatting options, but they won't achieve the result I'm looking for.
settings.Indent = true;
settings.NewLineOnAttributes = true;
These settings will put the first attribute on a newline, instead of keeping it on the same line as the node, and will line up attributes with the node.
Here is the Load call, which asks to preserve whitespace:
MyXml = XDocument.Load(filepath, LoadOptions.PreserveWhitespace);
But it seems like it won't do what I expected.
I tried to provide a custom class, which derives from XmlWriter to the XDocument.Save call, but I haven't managed to insert whitespace correctly without running into InvalidOperationException. Plus that solution seems overkill for the small addition I'm looking for.
For reference, this is my save call, not using my custom xml writer (which doesn't work anyway)
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.NewLineOnAttributes = true;
settings.OmitXmlDeclaration = true;
using (XmlWriter writer = XmlWriter.Create(filepath + "_auto", settings))
{
MyXml.Save(writer);
}
I ended up not using XDocument.Save altogether, and instead created a class that takes the XDocument, an XmlWriter, as well as a TextWriter.
The class parses all nodes in XDocument, TextWriter is bound to the file on disk, which XmlWriter uses as its output pipe.
My class then uses the XmlWriter to output xml. To achieve the extra spacing, I used the solution described here, https://stackoverflow.com/a/24010544/5920497 , which is why I also use the underlying TextWriter.
Here's an example of the solution.
Calling the class to save the document:
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.NewLineOnAttributes = false; // Behavior changed in PrettyXmlWriter
settings.OmitXmlDeclaration = true;
using(TextWriter rawwriter = File.CreateText(filepath))
using (XmlWriter writer = XmlWriter.Create(rawwriter, settings))
{
// rawwriter is used both by XmlWriter and PrettyXmlWriter
PrettyXmlWriter outputter = new PrettyXmlWriter(writer, rawwriter);
outputter.Write(MyXml);
writer.Flush();
writer.Close();
}
Inside PrettyXmlWriter:
private XmlWriter Writer { get; set; }
private TextWriter InnerTextWriter { get; set; }
public void Write(XDocument doc)
{
XElement root = doc.Root;
WriteNode(root, 0);
}
private void WriteNode(XNode node, int currentNodeDepth)
{
if(node.NodeType == XmlNodeType.Element)
{
WriteElement((XElement)node, currentNodeDepth);
}
else if(node.NodeType == XmlNodeType.Text)
{
WriteTextNode((XText)node, currentNodeDepth, doIndentAttributes);
}
}
private void WriteElement(XElement node, int currentNodeDepth)
{
Writer.WriteStartElement(node.Name.LocalName);
// Write attributes with indentation
XAttribute[] attributes = node.Attributes().ToArray();
if(attributes.Length > 0)
{
// First attribute, unindented.
Writer.WriteAttributeString(attributes[0].Name.LocalName, attributes[0].Value);
for(int i=1; i<attributes.Length; ++i)
{
// Write indentation
Writer.Flush();
string indentation = Writer.Settings.NewLineChars + string.Concat(Enumerable.Repeat(Writer.Settings.IndentChars, currentNodeDepth));
indentation += string.Concat(Enumerable.Repeat(" ", node.Name.LocalName.Length + 1));
// Using Underlying TextWriter trick to output whitespace
InnerTextWriter.Write(indentation);
Writer.WriteAttributeString(attributes[i].Name.LocalName, attributes[i].Value);
}
}
// output children
foreach(XNode child in node.Nodes())
{
WriteNode(child, currentNodeDepth + 1);
}
Writer.WriteEndElement();
}

Can't successfully clone XMLReader (Unexpected EOF)

I have a really strange problem with XMLReader/XMLTextReader classes.
I have a simple file load:
public void First()
{
XmlTextReader reader = new XmlTextReader(#"C:\MyXMLFile.xml");
XmlReader readerToSerialize;
XmlReader readerToLoad;
DuplicateReaders(reader, out readerToSerialize, out readerToLoad);
XmlSerializer serializer = new XmlSerializer(typeof(XMLTree));
XmlFeed = (XMLDescriptor)serializer.Deserialize(readerToSerialize);
xmlDoc.Load(readerToLoad);
}
protected void DuplicateReaders(XmlTextReader xmlReader, out XmlReader cloneOne, out readerToLoad)
{
XmlDocument _XmlDocument = new XmlDocument();
MemoryStream _Stream = new MemoryStream();
_XmlDocument.Load((XmlTextReader)xmlReader);
_XmlDocument.Save(_Stream);
_Stream.Position = 0L;
cloneOne = XmlReader.Create(_Stream);
_Stream.Position = 0L;
cloneTwo = XmlReader.Create(_Stream);
}
The problem is that only one of the cloned elements read the whole file successully, the next one (xmlDoc.Load) fails always at the same place (Line 91, Character 37 with this xml file). If I directly assign to xmlDoc (i.e. clone the original element only once and asign it directly from the function):
public void First()
{
XmlTextReader reader = new XmlTextReader(#"C:\MyXMLFile.xml");
XmlReader readerToSerialize;
DuplicateReaders(reader, out readerToSerialize);
XmlSerializer serializer = new XmlSerializer(typeof(XMLTree));
XmlFeed = (XMLDescriptor)serializer.Deserialize(readerToSerialize);
}
protected void DuplicateReaders(XmlTextReader xmlReader, out XmlReader cloneOne)
{
XmlDocument _XmlDocument = new XmlDocument();
MemoryStream _Stream = new MemoryStream();
_XmlDocument.Load((XmlTextReader)xmlReader);
_XmlDocument.Save(_Stream);
_Stream.Position = 0L;
cloneOne = XmlReader.Create(_Stream);
_Stream.Position = 0L;
this.xmlDoc.Load(_Stream);
}
I still get the same error 91/37 (Unexpected EOF), but this time in the Serializer.
My initial problem was that if I use xmlDoc.Load(reader) the reader instance get destroyed and I can't serialize it later on. I found the Duplicate function on the MSDN forums, but it's still a no go. What I want to achieve is quite simple:
Use only one reader and get one XmlDocument and one Serialized Class. How hard can it be?
You need to close the first reader before you can use the duplicate.
reader.Close()
Your both cloneOne and cloneTwo use the same underlying memory stream.
use a different MemoryStream
cloneTwo = XmlReader.Create(new MemoryStream(_Stream.ToArray()));
Found much easier solution, instead of cloning the two readers, i just use create a second one from XmlDoc and use it to deserialize.

Generate XML and HTML from MemoryStream

Need to generate an html report from XML and corresponding XSL butI have to use memorystream instead of IO File write on server directories. For the most part I managed to create an xml
MemoryStream ms = new MemoryStream();
XmlWriterSettings wSettings = new XmlWriterSettings();
wSettings.Indent = true;
using(XmlWriter writer = XmlWriter.Create(ms,wSettings))
{
/**
creating xml here
**/
writer.Flush();
writer.Close();
}
return ms; // returning the memory stream to another function
// to create html
// This Function creates
protected string ConvertToHtml(MemoryStream xmlOutput)
{
XPathDocument document = new XPathDocument(xmlOutput);
XmlDocument xDoc = new XmlDocument();
xDoc.Load(xmlOutput);
StringWriter writer = new StringWriter();
XslCompiledTransform transform = new XslCompiledTransform();
transform.Load(reportDir + "MyXslFile.xsl");
transform.Transform(xDoc, null, writer);
xmlOutput.Position = 1;
StreamReader sr = new StreamReader(xmlOutput);
return sr.RearToEnd();
}
Somewhere along the line I am messing up with creating the HTML Report and cant figure out how to send that file to client end. I dont have much experience working with memorystream. So, any help would be greatly appreciated. Thank you.
You're completely bypassing your transform here:
// This Function creates
protected string ConvertToHtml(MemoryStream xmlOutput)
{
XPathDocument document = new XPathDocument(xmlOutput);
XmlDocument xDoc = new XmlDocument();
xDoc.Load(xmlOutput);
StringWriter writer = new StringWriter();
XslCompiledTransform transform = new XslCompiledTransform();
transform.Load(reportDir + "MyXslFile.xsl");
transform.Transform(xDoc, null, writer);
// These lines are the problem
//xmlOutput.Position = 1;
//StreamReader sr = new StreamReader(xmlOutput);
//return sr.RearToEnd();
return writer.ToString()
}
Also, calling Flush right before you call Close on a writer is redundant as Close implies a flush operation.
It is not clear to me what you want to achieve but using both XmlDocument and XPathDocument to load from the same memory stream does not make sense I think. And I would set the MemoryStream to Position 0 before loading from it so either have the function creating and writing to the memory stream ensure that it sets the Position to zero or do that before you call Load on the XmlDocument or before you create an XPathDocument, depending on what input tree model you want to use.

How to get Xml as string from XDocument?

I am new to LINQ to XML. After you have built XDocument, how do you get the OuterXml of it like you did with XmlDocument?
You only need to use the overridden ToString() method of the object:
XDocument xmlDoc ...
string xml = xmlDoc.ToString();
This works with all XObjects, like XElement, etc.
I don't know when this changed, but today (July 2017) when trying the answers out, I got
"System.Xml.XmlDocument"
Instead of ToString(), you can use the originally intended way accessing the XmlDocument content: writing the xml doc to a stream.
XmlDocument xml = ...;
string result;
using (StringWriter writer = new StringWriter())
{
xml.Save(writer);
result = writer.ToString();
}
Several responses give a slightly incorrect answer.
XDocument.ToString() omits the XML declaration (and, according to #Alex Gordon, may return invalid XML if it contains encoded unusual characters like &).
Saving XDocument to StringWriter will cause .NET to emit encoding="utf-16", which you most likely don't want (if you save XML as a string, it's probably because you want to later save it as a file, and de facto standard for saving files is UTF-8 - .NET saves text files as UTF-8 unless specified otherwise).
#Wolfgang Grinfeld's answer is heading in the right direction, but it's unnecessarily complex.
Use the following:
var memory = new MemoryStream();
xDocument.Save(memory);
string xmlText = Encoding.UTF8.GetString(memory.ToArray());
This will return XML text with UTF-8 declaration.
Doing XDocument.ToString() may not get you the full XML.
In order to get the XML declaration at the start of the XML document as a string, use the XDocument.Save() method:
var ms = new MemoryStream();
using (var xw = XmlWriter.Create(new StreamWriter(ms, Encoding.GetEncoding("ISO-8859-1"))))
new XDocument(new XElement("Root", new XElement("Leaf", "data"))).Save(xw);
var myXml = Encoding.GetEncoding("ISO-8859-1").GetString(ms.ToArray());
Use ToString() to convert XDocument into a string:
string result = string.Empty;
XElement root = new XElement("xml",
new XElement("MsgType", "<![CDATA[" + "text" + "]]>"),
new XElement("Content", "<![CDATA[" + "Hi, this is Wilson Wu Testing for you! You can ask any question but no answer can be replied...." + "]]>"),
new XElement("FuncFlag", 0)
);
result = root.ToString();
While #wolfgang-grinfeld's answer is technically correct (as it also produces the XML declaration, as opposed to just using .ToString() method), the code generated UTF-8 byte order mark (BOM), which for some reason XDocument.Parse(string) method cannot process and throws Data at the root level is invalid. Line 1, position 1. error.
So here is a another solution without the BOM:
var utf8Encoding =
new UTF8Encoding(encoderShouldEmitUTF8Identifier: false);
using (var memory = new MemoryStream())
using (var writer = XmlWriter.Create(memory, new XmlWriterSettings
{
OmitXmlDeclaration = false,
Encoding = utf8Encoding
}))
{
CompanyDataXml.Save(writer);
writer.Flush();
return utf8Encoding.GetString(memory.ToArray());
}
I found this example in the Microsoft .NET 6 documentation for XDocument.Save method. I think it answers the original question (what is the XDocument equivalent for XmlDocument.OuterXml), and also addresses the concerns that others have pointed out already. By using the XmlWritingSettings you can predictably control the string output.
https://learn.microsoft.com/en-us/dotnet/api/system.xml.linq.xdocument.save
StringBuilder sb = new StringBuilder();
XmlWriterSettings xws = new XmlWriterSettings();
xws.OmitXmlDeclaration = true;
xws.Indent = true;
using (XmlWriter xw = XmlWriter.Create(sb, xws)) {
XDocument doc = new XDocument(
new XElement("Child",
new XElement("GrandChild", "some content")
)
);
doc.Save(xw);
}
Console.WriteLine(sb.ToString());
Looking at these answers, I see a lot of unnecessary complexity and inefficiency in pursuit of generating the XML declaration automatically. But since the declaration is so simple, there isn't much value in generating it. Just KISS (keep it simple, stupid):
// Extension method
public static string ToStringWithDeclaration(this XDocument doc, string declaration = null)
{
declaration ??= "<?xml version=\"1.0\" encoding=\"utf-8\"?>\r\n";
return declaration + doc.ToString();
}
// Usage
string xmlString = doc.ToStringWithDeclaration();
// Or
string xmlString = doc.ToStringWithDeclaration("...");
Using XmlWriter instead of ToString() can give you more control over how the output is formatted (such as if you want indentation), and it can write to other targets besides string.
The reason to target a memory stream is performance. It lets you skip the step of storing the XML in a string (since you know the data must end up in a different encoding eventually, whereas string is always UTF-16 in C#). For instance, for an HTTP request:
// Extension method
public static ByteArrayContent ToByteArrayContent(
this XDocument doc, XmlWriterSettings xmlWriterSettings = null)
{
xmlWriterSettings ??= new XmlWriterSettings();
using (var stream = new MemoryStream())
{
using (var writer = XmlWriter.Create(stream, xmlWriterSettings))
{
doc.Save(writer);
}
var content = new ByteArrayContent(stream.GetBuffer(), 0, (int)stream.Length);
content.Headers.ContentType = new MediaTypeHeaderValue("text/xml");
return content;
}
}
// Usage (XDocument -> UTF-8 bytes)
var content = doc.ToByteArrayContent();
var response = await httpClient.PostAsync("/someurl", content);
// Alternative (XDocument -> string -> UTF-8 bytes)
var content = new StringContent(doc.ToStringWithDeclaration(), Encoding.UTF8, "text/xml");
var response = await httpClient.PostAsync("/someurl", content);

Categories