I have the following method that I use to serialize various objects to XML. I then write the XML to a file. All the objects have the proper [DataContract] and [DataMember] attributes.
public static string Serialize<T>(T item)
{
var builder = new StringBuilder();
var serializer = new DataContractSerializer(typeof(T));
using (var xmlWriter = XmlWriter.Create(builder))
{
serializer.WriteObject(xmlWriter, item);
return builder.ToString();
}
}
The serialization works fine, however, I am missing the end of the content. I.e., the string does not contain the full XML document: the end gets truncated. Sometimes the string ends right in the middle of a tag.
There does not seem to be a miximum length that would cause an issue: I have strings of 18k that are incomplete and I have strings of 80k that are incomplete as well.
The XML structure is fairly simple and only about 6-8 nodes deep.
Am I missing something?
xmlWriter isn't flushed at the point you call ToString(); try:
using (var xmlWriter = XmlWriter.Create(builder))
{
serializer.WriteObject(xmlWriter, item);
}
return builder.ToString();
This does the ToString() after the Dispose() on xmlWriter, meaning it will flush any buffered data to the output (builder in this case).
Related
XmlWriter allows configuring indentation when using XmlWriter.Create and XmlWriterSettings.
In general, I want Indent = true and NewLineOnAttributes = false, except when writing xmlns namespace declarations at the beginning of the file, where I would like to have new lines between each xmlns namespace for readability.
Is it possible to force XmlWriter to do a line break after writing a specific attribute, and otherwise follow general indentation rules?
I tried using WriteWhitespace and WriteRaw with \n:
using System;
using System.Text;
using System.Xml;
namespace XmlWriterIndent
{
class Program
{
static void Main(string[] args)
{
var output = new StringBuilder();
using (var writer = XmlWriter.Create(output, new XmlWriterSettings { Indent = true, NewLineOnAttributes = true }))
{
writer.WriteStartDocument();
writer.WriteStartElement("Node");
writer.WriteAttributeString("key1", "value1");
writer.WriteAttributeString("key2", "value2");
writer.WriteAttributeString("xmlns", "n1", null, "scheme://mynamespace.com");
writer.WriteRaw("\n");
writer.WriteAttributeString("xmlns", "n2", null, "scheme://anothernamespace.com");
writer.WriteEndElement();
writer.WriteEndDocument();
}
var xml = output.ToString();
Console.WriteLine(xml);
}
}
}
Unfortunately this throws an exception saying the XML document would be invalid.
UPDATE: Actually, after checking more carefully, the exception is not in the WriteRaw method itself, but rather in the following WriteAttributeString call, as I am calling these methods in a loop for all namespaces.
It looks like WriteRaw moves the XmlWriter into the element content state somehow. Is it possible to use WriteRaw or somehow insert whitespace between attributes without changing the writer state?
UPDATE: Added self-contained example. Actually, it looks like in general namespace declarations are ignored even when using NewLineOnAttributes, i.e. all attributes have new lines except namespace declarations, which are somehow handled differently despite being regular attributes.
Unfortunately, I'm approaching the conclusion that the XmlWriter API is simply broken, as there is no way to do raw formatting of XML, since the WriteRaw forces a change in the writer state.
Looking into the actual source code at referencesource shows that the special write method WriteIndent is used to handle indentation inside XmlWriter. This method has special behavior that doesn't change the state, but there seems to be no way to access it or the underlying data stream, so it seems impossible to work around this without a full reimplementation of the entire XML writer stack:
https://referencesource.microsoft.com/#System.Xml/System/Xml/Core/XmlEncodedRawTextWriter.cs,1739
The XmlWriter API has no way to do raw formatting of XML attributes. WriteRaw would be the appropriate method to call, but the internal XmlWellFormedWriter returned by XmlWriter.Create always advances the writer state when this method is called, advancing the XML state machine to content. If we are in the middle of writing attributes, this finishes the start tag of the element and moves to content, which is not where we want to write our custom indentation.
Several internal XmlWriter classes implement more low-level WriteRaw methods, but there seems to be no way of accessing them, as XmlWriter.Create always wraps created writers with a XmlWellFormedWriter instance before returning.
Therefore, the only way to workaround the issue is to define and instantiate a custom XmlWriter class which controls both the underlying stream and the base XmlWriter. That way we can bypass the XmlWriter API and write directly into the stream when we need to do our custom indentation.
There are a couple of limitations with the current solution:
XmlWriter implementations do not write directly to the stream, and instead keep their own internal buffers for efficiency. That means that whenever we want to bypass the XmlWriter we need to call Flush to make sure that our stream is in the right position;
In order for indentation to make sense, we need to keep track of the correct indent level. To do this for the whole document would be possible, but tedious. For simplicity, this solution only formats the top level xmlns declarations;
For completeness, we need to deal with both Stream and TextWriter as possible output types.
Finally, the XmlWriter abstract class has dozens of methods to override which we don't care about, but that need to be bridged to the underlying writer. For conciseness, I have omitted all but the relevant overrides:
class XmlnsIndentedWriter : XmlWriter
{
bool isRootElement;
int indentLevel = -1;
readonly Stream stream;
readonly TextWriter textWriter;
readonly XmlWriter writer;
private XmlnsIndentedWriter(Stream output, XmlWriter baseWriter)
{
stream = output;
writer = baseWriter;
}
private XmlnsIndentedWriter(TextWriter output, XmlWriter baseWriter)
{
textWriter = output;
writer = baseWriter;
}
public static new XmlWriter Create(StringBuilder output, XmlWriterSettings settings)
{
var writer = XmlWriter.Create(output, settings);
return new XmlnsIndentedWriter(new StringWriter(output, CultureInfo.InvariantCulture), writer);
}
public static new XmlWriter Create(Stream stream, XmlWriterSettings settings)
{
var writer = XmlWriter.Create(stream, settings);
return new XmlnsIndentedWriter(stream, writer);
}
// snip: override all methods in the XmlWriter class
private void WriteRawText(string text)
{
writer.Flush();
if (stream != null)
{
// example only, this could be optimized with buffers, etc.
var buf = writer.Settings.Encoding.GetBytes(text);
stream.Write(buf, 0, buf.Length);
}
else if (textWriter != null)
{
textWriter.Write(text);
}
}
public override void WriteStartDocument()
{
isRootElement = true;
writer.WriteStartDocument();
}
public override void WriteStartElement(string prefix, string localName, string ns)
{
if (isRootElement)
{
if (indentLevel < 0)
{
// initialize the indent level;
// length of local name + any control characters / prefixes, etc.
indentLevel = localName.Length + 1;
}
else
{
// do not track indent for the whole document;
// when second element starts, we are done
isRootElement = false;
indentLevel = -1;
}
}
writer.WriteStartElement(prefix, localName, ns);
}
public override void WriteEndAttribute()
{
writer.WriteEndAttribute();
if (indentLevel >= 0)
{
RawText(Environment.NewLine + new string(' ', indentLevel));
}
}
}
There is a choice to add the indentation either before each attribute, or after.
Here I have opted for the latter, as it seems to be the only option if you want to also indent the default xmlns declaration. This declaration is written out after the writer state moves to content, and there seems to be no way of intercepting it otherwise.
I'm using serialization to a string as follows.
public static string Stringify(this Process self)
{
XmlSerializer serializer = new XmlSerializer(typeof(Process));
using (StringWriter writer = new StringWriter())
{
serializer.Serialize(writer, self,);
return writer.ToString();
}
}
Then, I deserialize using this code. Please note that it's not an actual stringification from above that's used. In our business logic, it makes more sense to serialize a path, hence reading in from said path and creating an object based on the read data.
public static Process Processify(this string self)
{
XmlSerializer serializer = new XmlSerializer(typeof(Process));
using (XmlReader reader = XmlReader.Create(self))
return serializer.Deserialize(reader) as Process;
}
}
This works as supposed to except for a small issue with encoding. The string XML that's produced, contains the addition encoding="utf-16" as an attribute on the base tag (the one that's about XML version, not the actual data).
When I read in, I get an exception because of mismatching encodings. As far I could see, there's no way to specify the encoding for serialization nor deserialization in any of the objects I'm using.
How can I do that?
For now, I'm using a very brute work-around by simply cutting of the excessive junk like so. It's Q&D and I want to remove it.
public static string Stringify(this Process self)
{
XmlSerializer serializer = new XmlSerializer(typeof(Process));
using (StringWriter writer = new StringWriter())
{
serializer.Serialize(writer, self,);
return writer.ToString().Replace(" encoding=\"utf-16\"", "");
}
}
I'm developing a custom XmlFormatter for a Web API app. I want it to treat "IEnumerable" elements diferently. Here is the code:
class CustomXmlFormatter : XmlMediaTypeFormatter
{
public override Task WriteToStreamAsync(Type type, object value, System.IO.Stream writeStream, System.Net.Http.HttpContent content, System.Net.TransportContext transportContext, System.Threading.CancellationToken cancellationToken)
{
XmlSerializer serializer;
return Task.Factory.StartNew(() =>
{
if ((typeof(IEnumerable<object>)).IsAssignableFrom(type))
{
XmlWriter writer = XmlWriter.Create(writeStream);
writer.WriteStartElement("array");
/* foreach (object o in (IEnumerable<object>)value)
{
serializer = new XmlSerializer(o.GetType());
serializer.Serialize(writeStream, o);
}*/
writer.WriteEndElement();
}
else
{
serializer = new XmlSerializer(type);
serializer.Serialize(writeStream, value, xsn);
}
});
The idea is that, when it receives a List, it writes a tag of "array" and then serializes all the elements of the List. I've commented the foreach loop to simplify the question.
The problem is that, when the code is executed, it writes an empty XML (no "array" tag). How could i implement something like that?
Do you get any Error? maybe in your array items have some value that not in valid in xml tag. and this happened make your xml data empty. you must check validation of items in array. all character not validation for xml tag name.
i think using this example is better:
sr = new StreamWriter(filename, false, System.Text.Encoding.UTF8);
//Write the header
sr.WriteLine("<?xml version=\"1.0\" encoding=\"utf-8\" ?>");
//Write our root node
sr.WriteLine("<" + node.Text + ">");
sr.WriteLine("array");
sr.WriteLine("</" + node.Text + ">");
I'm trying to serialize an object to memory, pass it to another process as a string, and deserialize it.
I've discovered that the XML Serialization process strips the \r off of the newlines for strings in the object.
byte[] b;
// serialize to memory.
using (MemoryStream ms = new MemoryStream())
{
XmlSerializer xml = new XmlSerializer(this.GetType());
xml.Serialize(ms, this);
b = ms.GetBuffer();
}
// I can now send the bytes to my process.
Process(b);
// On the other end, I use:
using (MemoryStream ms = new MemoryStream(b))
{
XmlSerializer xml = new XmlSerializer(this.GetType());
clone = (myObject)xml.Deserialize(ms);
}
How do I serialize an object without serializing it to disk just like this, but without mangling the newlines in the strings?
The strings should be wrapped in CDATA sections to preserve the newlines.
The answer came from anther SO post, but I'm reposting it here because I had to tweak it a little.
I had to create a new class to manage XML read/write to memory stream. Here it is:
public class SafeXmlSerializer : XmlSerializer
{
public SafeXmlSerializer(Type type) : base(type) { }
public new void Serialize(StreamWriter stream, object o)
{
XmlWriterSettings ws = new XmlWriterSettings();
ws.NewLineHandling = NewLineHandling.Entitize;
using (XmlWriter xmlWriter = XmlWriter.Create(stream, ws))
{
base.Serialize(xmlWriter, o);
}
}
}
Since it is built on top of XmlSerializer, it should behave exactly as expected. It's just that when I serialize with a StreamWriter, I will use the "safe" version of the serialization, thus saving myself the headache.
I hope this helps someone else.
I'm trying to serialize a very large IEnumerable<MyObject> using an XmlSerializer without keeping all the objects in memory.
The IEnumerable<MyObject> is actually lazy..
I'm looking for a streaming solution that will:
Take an object from the IEnumerable<MyObject>
Serialize it to the underlying stream using the standard serialization (I don't want to handcraft the XML here!)
Discard the in memory data and move to the next
I'm trying with this code:
using (var writer = new StreamWriter(filePath))
{
var xmlSerializer = new XmlSerializer(typeof(MyObject));
foreach (var myObject in myObjectsIEnumerable)
{
xmlSerializer.Serialize(writer, myObject);
}
}
but I'm getting multiple XML headers and I cannot specify a root tag <MyObjects> so my XML is invalid.
Any idea?
Thanks
The XmlWriter class is a fast streaming API for XML generation. It is rather low-level, MSDN has an article on instantiating a validating XmlWriter using XmlWriter.Create().
Edit: link fixed. Here is sample code from the article:
async Task TestWriter(Stream stream)
{
XmlWriterSettings settings = new XmlWriterSettings();
settings.Async = true;
using (XmlWriter writer = XmlWriter.Create(stream, settings)) {
await writer.WriteStartElementAsync("pf", "root", "http://ns");
await writer.WriteStartElementAsync(null, "sub", null);
await writer.WriteAttributeStringAsync(null, "att", null, "val");
await writer.WriteStringAsync("text");
await writer.WriteEndElementAsync();
await writer.WriteCommentAsync("cValue");
await writer.WriteCDataAsync("cdata value");
await writer.WriteEndElementAsync();
await writer.FlushAsync();
}
}
Here's what I use:
using System;
using System.Collections.Generic;
using System.Xml;
using System.Xml.Serialization;
using System.Text;
using System.IO;
namespace Utils
{
public class XMLSerializer
{
public static Byte[] StringToUTF8ByteArray(String xmlString)
{
return new UTF8Encoding().GetBytes(xmlString);
}
public static String SerializeToXML<T>(T objectToSerialize)
{
StringBuilder sb = new StringBuilder();
XmlWriterSettings settings =
new XmlWriterSettings {Encoding = Encoding.UTF8, Indent = true};
using (XmlWriter xmlWriter = XmlWriter.Create(sb, settings))
{
if (xmlWriter != null)
{
new XmlSerializer(typeof(T)).Serialize(xmlWriter, objectToSerialize);
}
}
return sb.ToString();
}
public static void DeserializeFromXML<T>(string xmlString, out T deserializedObject) where T : class
{
XmlSerializer xs = new XmlSerializer(typeof (T));
using (MemoryStream memoryStream = new MemoryStream(StringToUTF8ByteArray(xmlString)))
{
deserializedObject = xs.Deserialize(memoryStream) as T;
}
}
}
}
Then just call:
string xml = Utils.SerializeToXML(myObjectsIEnumerable);
I haven't tried it with, for example, an IEnumerable that fetches objects one at a time remotely, or any other weird use cases, but it works perfectly for List<T> and other collections that are in memory.
EDIT: Based on your comments in response to this, you could use XmlDocument.LoadXml to load the resulting XML string into an XmlDocument, save the first one to a file, and use that as your master XML file. For each item in the IEnumerable, use LoadXml again to create a new in-memory XmlDocument, grab the nodes you want, append them to the master document, and save it again, getting rid of the new one.
After you're finished, there may be a way to wrap all of the nodes in your root tag. You could also use XSL and XslCompiledTransform to write another XML file with the objects properly wrapped in the root tag.
You can do this by implementing the IXmlSerializable interface on the large class. The implementation of the WriteXml method can write the start tag, then simply loop over the IEnumerable<MyObject> and serialize each MyObject to the same XmlWriter, one at a time.
In this implementation, there won't be any in-memory data to get rid of (past what the garbage collector will collect).