I'm building an MVC5 application which pulls records from a database and allows a user to perform some basic data cleansing edits.
Once the data has been cleansed it needs to be exported as XML, run through a validator and then uploaded to a third party portal.
I'm using Service Stack, and I've found it fairly quick and straightforward in the past, particularly when outputting to CSV.
The one issue I'm having is with the XML serialzer. I'm not sure how to make it generate well formed XML.
The file that i'm getting simply dumps it on one line, which won't validate because it isn't well formed.
below is an extract from my controller action:
Response.Clear();
Response.ContentType = "text/xml";
Response.AddHeader("Content-Disposition", "attachment; filename="myFile.xml"");
XmlSerializer.SerializeToStream(viewModel, Response.OutputStream);
Response.End();
UPDATE: Thanks for the useful comments, as explained I'm not talking about pretty printing, the issue is I need to run the file through a validator before uploading it to a third party. The error message the validator is throwing is Error:0000, XML not well-formed. Cannot have more than one tag on one line.
Firstly, be aware that most white space (including new lines) in XML is insignificant -- it has no meaning, and is only for beautification. The lack of new lines doesn't make the XML ill-formed. See White Space in XML Documents or https://www.w3.org/TR/REC-xml/#sec-white-space. Thus in theory it shouldn't matter whether ServiceStack's XmlSerializer is putting all of your XML on a single line.
That being said, if for whatever reason you must cosmetically break your XML up into multiple lines, you'll need to do a little work. From the source code we can see that XmlSerializer uses DataContractSerializer with a hardcoded static XmlWriterSettings that does not allow for setting XmlWriterSettings.Indent = true. However, since this class is just a very thin wrapper on Microsoft's data contract serializer, you can substitute your own code:
public static class DataContractSerializerHelper
{
private static readonly XmlWriterSettings xmlWriterSettings = new XmlWriterSettings { Indent = true, IndentChars = " " };
public static string SerializeToString<T>(T from)
{
try
{
using (var ms = new MemoryStream())
using (var xw = XmlWriter.Create(ms, xmlWriterSettings))
{
var serializer = new DataContractSerializer(from.GetType());
serializer.WriteObject(xw, from);
xw.Flush();
ms.Seek(0, SeekOrigin.Begin);
var reader = new StreamReader(ms);
return reader.ReadToEnd();
}
}
catch (Exception ex)
{
throw new SerializationException(string.Format("Error serializing \"{0}\"", from), ex);
}
}
public static void SerializeToWriter<T>(T value, TextWriter writer)
{
try
{
using (var xw = XmlWriter.Create(writer, xmlWriterSettings))
{
var serializer = new DataContractSerializer(value.GetType());
serializer.WriteObject(xw, value);
}
}
catch (Exception ex)
{
throw new SerializationException(string.Format("Error serializing \"{0}\"", value), ex);
}
}
public static void SerializeToStream(object obj, Stream stream)
{
if (obj == null)
return;
using (var xw = XmlWriter.Create(stream, xmlWriterSettings))
{
var serializer = new DataContractSerializer(obj.GetType());
serializer.WriteObject(xw, obj);
}
}
}
And then do:
DataContractSerializerHelper.SerializeToStream(viewModel, Response.OutputStream);
Related
I have created the following wrapper method to disable DTD
public class Program
{
public static void Main(string[] args)
{
string s = #"<?xml version =""1.0"" encoding=""utf-16""?>
<ArrayOfSerializingTemplateItem xmlns:xsd=""http://www.w3.org/2001/XMLSchema"" xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"">
<SerializingTemplateItem>
</SerializingTemplateItem>
</ArrayOfSerializingTemplateItem >";
try
{
XmlReader reader = XmlWrapper.CreateXmlReaderObject(s);
XmlSerializer sr = new XmlSerializer(typeof(List<SerializingTemplateItem>));
Object ob = sr.Deserialize(reader);
}
catch (Exception ex)
{
Console.WriteLine(ex);
throw;
}
Console.ReadLine();
}
}
public class XmlWrapper
{
public static XmlReader CreateXmlReaderObject(string sr)
{
byte[] byteArray = Encoding.UTF8.GetBytes(sr);
MemoryStream stream = new MemoryStream(byteArray);
stream.Position = 0;
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.None;
settings.DtdProcessing = DtdProcessing.Ignore;
return XmlReader.Create(stream, settings);
}
}
public class SerializingTemplateItem
{
}
The above throws exception "There is no Unicode byte order mark. Cannot switch to Unicode." (Demo fiddle here: https://dotnetfiddle.net/pGxOE9).
But if I use the following code to create the XmlReader instead of calling the XmlWrapper method. It works fine.
StringReader stringReader = new StringReader( xml );
XmlReader reader = new XmlTextReader( stringReader );
But I need to use the wrapper method as a security requirement to disable DTD. I don't know why I am unable to deserialize after calling my wrapper method. Any help will be highly appreciated.
Your problem is that you have encoded the XML into a MemoryStream using Encoding.UTF8, but the XML string itself claims to be encoded in UTF-16 in the encoding declaration in its XML text declaration:
<?xml version ="1.0" encoding="utf-16"?>
<ArrayOfSerializingTemplateItem>
<!-- Content omitted -->
</ArrayOfSerializingTemplateItem >
Apparently when the XmlReader encounters this declaration, it tries honor the declaration and switch from UTF-8 to UTF-16 but fails for some reason - possibly because the stream really is encoded in UTF-8. Conversely when the deprecated XmlTextReader encounters the declaration, it apparently just ignores it as not implemented, which happens to cause things to work successfully in this situation.
The simplest way to resolve this is to read directly from the string using a StringReader using XmlReader.Create(TextReader, XmlReaderSettings):
public class XmlWrapper
{
public static XmlReader CreateXmlReaderObject(string sr)
{
var settings = new XmlReaderSettings
{
ValidationType = ValidationType.None,
DtdProcessing = DtdProcessing.Ignore,
};
return XmlReader.Create(new StringReader(sr), settings);
}
}
Since a c# string is always encoded internally in UTF-16 the encoding statement in the XML will be ignored as irrelevant. This will also be more performant as the conversion to an intermediate byte array is completely skipped.
Incidentally, you should dispose of your XmlReader via a using statement:
Object ob;
using (var reader = XmlWrapper.CreateXmlReaderObject(s))
{
XmlSerializer sr = new XmlSerializer(typeof(List<SerializingTemplateItem>));
ob = sr.Deserialize(reader);
}
Working sample fiddle here.
Related questions:
Meaning of - <?xml version="1.0" encoding="utf-8"?>
Ignoring specified encoding when deserializing XML
I'm using serialization to a string as follows.
public static string Stringify(this Process self)
{
XmlSerializer serializer = new XmlSerializer(typeof(Process));
using (StringWriter writer = new StringWriter())
{
serializer.Serialize(writer, self,);
return writer.ToString();
}
}
Then, I deserialize using this code. Please note that it's not an actual stringification from above that's used. In our business logic, it makes more sense to serialize a path, hence reading in from said path and creating an object based on the read data.
public static Process Processify(this string self)
{
XmlSerializer serializer = new XmlSerializer(typeof(Process));
using (XmlReader reader = XmlReader.Create(self))
return serializer.Deserialize(reader) as Process;
}
}
This works as supposed to except for a small issue with encoding. The string XML that's produced, contains the addition encoding="utf-16" as an attribute on the base tag (the one that's about XML version, not the actual data).
When I read in, I get an exception because of mismatching encodings. As far I could see, there's no way to specify the encoding for serialization nor deserialization in any of the objects I'm using.
How can I do that?
For now, I'm using a very brute work-around by simply cutting of the excessive junk like so. It's Q&D and I want to remove it.
public static string Stringify(this Process self)
{
XmlSerializer serializer = new XmlSerializer(typeof(Process));
using (StringWriter writer = new StringWriter())
{
serializer.Serialize(writer, self,);
return writer.ToString().Replace(" encoding=\"utf-16\"", "");
}
}
I'm currently trying to serialize a class into XML to be posted to php web service.
Whenever I did the normal serialization using XMLSerializer, XML declaration is always appear in the first line of the XML document (similar as to <?xml ....?>). I tested the XML and unable to get it working because the endpoint does not accept XML declaration and I can't do anything about it.
I'm unfamiliar with XML Serialization in C# to be honest.
Therefore, I used XMLWriter to do this as below :-
private string SerializeClassToString(GetRiskReport value)
{
var emptyNS = new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty });
var ser = new XmlSerializer(value.GetType());
var settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = true;
using (var stream = new StringWriter())
{
using (var writer = XmlWriter.Create(stream, settings))
{
ser.Serialize(writer, value, emptyNS);
return stream.ToString();
}
}
}
Result for the Namespace is
<GetRiskReport FCRA=\"false\" ReturnResultsOnly=\"false\" Monitoring=\"false\">
... and I'm able to omit the XML Declaration, however I'm being introduced with 2 new problem.
I got \r\n for new line and I have escaped double quote such as ReturnResultsOnly=\"false\" Monitoring=\"false\" which is also unable processed by the endpoint.
I would like to ask is that does anyone can give me an idea on how to change the XmlWriterSetting to omit XML Declaration, avoid \r\n and also avoid escaped double quotes \"
Thanks for your advice in advance.
Simon
Try with following settings
settings.NewLineHandling = NewLineHandling.None;
settings.CheckCharacters = false;
private void SerializeClassToString(GetRiskReport value)
{
var emptyNS = new XmlSerializerNamespaces(new[]{XmlQualifiedName.Empty});
var ser = new XmlSerializer(value.GetType());
var settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = true;
string path = 'your_file_path_here'
if (File.Exists(path)) File.Delete(path);
FileStream stream = File.Create(path);
using (var writer = XmlWriter.Create(stream, settings))
{
ser.Serialize(writer, value, emptyNS);
return;
}
}
There was no way to avoid ms bug or thier intensional specification about xmlserializing.It's easier and faster to use filestream object.
I'm using the following code to serialise an object:
public static string Serialise(IMessageSerializer messageSerializer, DelayMessage message)
{
using (var stream = new MemoryStream())
{
messageSerializer.Serialize(new[] { message }, stream);
return Encoding.UTF8.GetString(stream.ToArray());
}
}
Unfortunately, when I save it to a database (using LINQ to SQL), then query the database, the string appears to start with a question mark:
?<z:anyType xmlns...
How do I get rid of that? When I try to de-serialise using the following:
public static DelayMessage Deserialise(IMessageSerializer messageSerializer, string data)
{
using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(data)))
{
return (DelayMessage)messageSerializer.Deserialize(stream)[0];
}
}
I get the following exception:
"Error in line 1 position 1. Expecting
element 'anyType' from namespace
'http://schemas.microsoft.com/2003/10/Serialization/'..
Encountered 'Text' with name '',
namespace ''. "
The implementations of the messageSerializer use the DataContractSerializer as follows:
public void Serialize(IMessage[] messages, Stream stream)
{
var xws = new XmlWriterSettings { ConformanceLevel = ConformanceLevel.Fragment };
using (var xmlWriter = XmlWriter.Create(stream, xws))
{
var dcs = new DataContractSerializer(typeof(IMessage), knownTypes);
foreach (var message in messages)
{
dcs.WriteObject(xmlWriter, message);
}
}
}
public IMessage[] Deserialize(Stream stream)
{
var xrs = new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment };
using (var xmlReader = XmlReader.Create(stream, xrs))
{
var dcs = new DataContractSerializer(typeof(IMessage), knownTypes);
var messages = new List<IMessage>();
while (false == xmlReader.EOF)
{
var message = (IMessage)dcs.ReadObject(xmlReader);
messages.Add(message);
}
return messages.ToArray();
}
}
Unfortunately, when I save it to a database (using LINQ to SQL), then query the database, the string appears to start with a question mark:
?<z:anyType xmlns...
Your database is not set up to support Unicode characters. You write a string including a BOM in it, the database can't store it so mangles it into a '?'. Then when you come back to read the string as XML, the '?' is text content outside the root element and you get an error. (You can only have whitespace text outside the root element.)
Why is the BOM getting there? Because Microsoft love dropping BOMs all over the the place, even when they're not needed (and they never are, with UTF-8). The solution is to make your own instance of UTF8Encoding instead of using the built-in Encoding.UTF8, and tell it you don't want its stupid BOMs:
Encoding utf8onlynotasridiculouslysucky= new UTF8Encoding(false);
However, this is only really masking the real issue, which is the database configuration.
I'm trying to serialize a very large IEnumerable<MyObject> using an XmlSerializer without keeping all the objects in memory.
The IEnumerable<MyObject> is actually lazy..
I'm looking for a streaming solution that will:
Take an object from the IEnumerable<MyObject>
Serialize it to the underlying stream using the standard serialization (I don't want to handcraft the XML here!)
Discard the in memory data and move to the next
I'm trying with this code:
using (var writer = new StreamWriter(filePath))
{
var xmlSerializer = new XmlSerializer(typeof(MyObject));
foreach (var myObject in myObjectsIEnumerable)
{
xmlSerializer.Serialize(writer, myObject);
}
}
but I'm getting multiple XML headers and I cannot specify a root tag <MyObjects> so my XML is invalid.
Any idea?
Thanks
The XmlWriter class is a fast streaming API for XML generation. It is rather low-level, MSDN has an article on instantiating a validating XmlWriter using XmlWriter.Create().
Edit: link fixed. Here is sample code from the article:
async Task TestWriter(Stream stream)
{
XmlWriterSettings settings = new XmlWriterSettings();
settings.Async = true;
using (XmlWriter writer = XmlWriter.Create(stream, settings)) {
await writer.WriteStartElementAsync("pf", "root", "http://ns");
await writer.WriteStartElementAsync(null, "sub", null);
await writer.WriteAttributeStringAsync(null, "att", null, "val");
await writer.WriteStringAsync("text");
await writer.WriteEndElementAsync();
await writer.WriteCommentAsync("cValue");
await writer.WriteCDataAsync("cdata value");
await writer.WriteEndElementAsync();
await writer.FlushAsync();
}
}
Here's what I use:
using System;
using System.Collections.Generic;
using System.Xml;
using System.Xml.Serialization;
using System.Text;
using System.IO;
namespace Utils
{
public class XMLSerializer
{
public static Byte[] StringToUTF8ByteArray(String xmlString)
{
return new UTF8Encoding().GetBytes(xmlString);
}
public static String SerializeToXML<T>(T objectToSerialize)
{
StringBuilder sb = new StringBuilder();
XmlWriterSettings settings =
new XmlWriterSettings {Encoding = Encoding.UTF8, Indent = true};
using (XmlWriter xmlWriter = XmlWriter.Create(sb, settings))
{
if (xmlWriter != null)
{
new XmlSerializer(typeof(T)).Serialize(xmlWriter, objectToSerialize);
}
}
return sb.ToString();
}
public static void DeserializeFromXML<T>(string xmlString, out T deserializedObject) where T : class
{
XmlSerializer xs = new XmlSerializer(typeof (T));
using (MemoryStream memoryStream = new MemoryStream(StringToUTF8ByteArray(xmlString)))
{
deserializedObject = xs.Deserialize(memoryStream) as T;
}
}
}
}
Then just call:
string xml = Utils.SerializeToXML(myObjectsIEnumerable);
I haven't tried it with, for example, an IEnumerable that fetches objects one at a time remotely, or any other weird use cases, but it works perfectly for List<T> and other collections that are in memory.
EDIT: Based on your comments in response to this, you could use XmlDocument.LoadXml to load the resulting XML string into an XmlDocument, save the first one to a file, and use that as your master XML file. For each item in the IEnumerable, use LoadXml again to create a new in-memory XmlDocument, grab the nodes you want, append them to the master document, and save it again, getting rid of the new one.
After you're finished, there may be a way to wrap all of the nodes in your root tag. You could also use XSL and XslCompiledTransform to write another XML file with the objects properly wrapped in the root tag.
You can do this by implementing the IXmlSerializable interface on the large class. The implementation of the WriteXml method can write the start tag, then simply loop over the IEnumerable<MyObject> and serialize each MyObject to the same XmlWriter, one at a time.
In this implementation, there won't be any in-memory data to get rid of (past what the garbage collector will collect).