XML Deserialization encoding issue - c#

I already searched a lot and unable to find a solution and unable to determine the correct approach
I am serializing an object to xml string and deserializing it back to an object using c#. XML string after serialization adds a leading ?. When I dezerialize it back to the object I am getting an error There is an error in XML document (1, 1)
?<?xml version="1.0" encoding="utf-16"?>
Serialization code:
string xmlString = null;
MemoryStream memoryStream = new MemoryStream();
XmlSerializer xs = new XmlSerializer(typeof(T));
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add("abc", "http://example.com/abc/");
XmlTextWriter xmlTextWriter = new XmlTextWriter(memoryStream,Encoding.Unicode);
xs.Serialize(xmlTextWriter, obj, ns);
memoryStream = (MemoryStream)xmlTextWriter.BaseStream;
xmlString = ConvertByteArrayToString(memoryStream.ToArray());
ConvertByteArrayToString:
UnicodeEncoding encoding = new UnicodeEncoding();
string constructedString = encoding.GetString(characters);
Deserialization Code:
XmlSerializer ser = new XmlSerializer(typeof(T));
StringReader stringReader = new StringReader(xml);
XmlTextReader xmlReader = new XmlTextReader(stringReader);
object obj = ser.Deserialize(xmlReader);
xmlReader.Close();
stringReader.Close();
return (T)obj;
I would like to know what I am doing wrong with encoding and I need a solution that works for most cases. Thanks

Use following function for serialization and Deserialization
public static string Serialize<T>(T dataToSerialize)
{
try
{
var stringwriter = new System.IO.StringWriter();
var serializer = new XmlSerializer(typeof(T));
serializer.Serialize(stringwriter, dataToSerialize);
return stringwriter.ToString();
}
catch
{
throw;
}
}
public static T Deserialize<T>(string xmlText)
{
try
{
var stringReader = new System.IO.StringReader(xmlText);
var serializer = new XmlSerializer(typeof(T));
return (T)serializer.Deserialize(stringReader);
}
catch
{
throw;
}
}

Your serialized XML contains a Unicode byte-order mark in the beginning, and this is where the deserializer fails.
To remove the BOM you need to create a different version of encoding suppressing BOM instead of using default Encoding.Unicode:
new XmlTextWriter(memoryStream, new UnicodeEncoding(false, false))
Here the second false prevents BOM being prepended to the string.

Related

Illegal characters in path error in XML Deserilation

I have got XML response from my client. I can't deserialize the XML as string, it throws an Illegal characters in path error. So now I save the file in temp folder and retrieve that. Is it possible to do the deserialize without saving the XML file first?
string xml = Post();
XmlSerializer deserializer = new XmlSerializer(typeof(Envelope));
TextReader reader = new StreamReader(xml); <-- Illegal characters in path error -->
object obj = deserializer.Deserialize(reader);
Envelope XmlData = (Envelope)obj;
reader.Close();
Edit 1 -
XmlSerializer serializer = new XmlSerializer(typeof(Envelope));
using (TextWriter writer = new StreamWriter(xml)) <-- StringWriter is Possible here? -->
{
serializer.Serialize(writer, XmlData);
}
Instead of a StreamReader, use a StringReader, that takes a string as constructor parameter.
TextReader reader = new StringReader(xml);
For writing, use this:
string output;
XmlSerializer serializer = new XmlSerializer(typeof(Envelope));
using (TextWriter writer = new StreamWriter(xml)) <-- StringWriter is Possible here? -->
{
serializer.Serialize(writer, XmlData);
output = writer.ToString();
}

Writing an XML fragment using XmlWriterSettings and XmlSerializer is giving an extra character

I need to write an XML fragment to be consumed by a web service. Any xml declarations cause the web service to reject the request. To support this I have the following class:
public class ContentQueryCriteria
{
public int Type { get; set; }
public string Value { get; set; }
public int Condition { get; set; }
}
which allow me to set the request criteria and then get the results.
The code is used like this:
ContentQueryCriteria content = new ContentQueryCriteria();
content.Type = 1;
content.Value = "NAVS500";
content.Condition = 1;
string requestBody = SerializeToString(content);
Console.WriteLine(requestBody);
When I serialize this to an XML file I get a proper response, without the XML declaration or any namespaces. However, I would rather capture the data in a memory stream rather then a file.
Using the following method (taken from http://www.codeproject.com/Articles/58287/XML-Serialization-Tips-Tricks ) I am able to achieve results, but for some reason I have a ? listed as part of the string.
public static string SerializeToString(object obj)
{
XmlSerializer serializer = new XmlSerializer(obj.GetType());
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add("", "");
MemoryStream ms = new MemoryStream();
XmlWriterSettings settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = true;
settings.Encoding = Encoding.Unicode;
XmlWriter writer = XmlWriter.Create(ms, settings);
serializer.Serialize(writer, obj, ns);
return Encoding.Unicode.GetString(ms.ToArray());
}
the resulting string is:
?<ContentQueryCriteria><Type>1</Type><Value>NAVS500</Value><Condition>1</Condition></ContentQueryCriteria>
if I set OmitXmlDeclaration = false I get the following string:
?<?xml version="1.0" encoding="utf-16"?><ContentQueryCriteria><Type>1</Type><Value>NAVS500</Value><Condition>1</Condition></ContentQueryCriteria>
Can anyone help me determine why the extra ? is there and how I can remove it?
Working SerializeToString method with no BOM
public static string SerializeToString(object obj)
{
XmlSerializer serializer = new XmlSerializer(obj.GetType());
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add("", "");
MemoryStream ms = new MemoryStream();
XmlWriterSettings settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = true;
settings.Encoding = new UnicodeEncoding(bigEndian: false, byteOrderMark: false);
XmlWriter writer = XmlWriter.Create(ms, settings);
serializer.Serialize(writer, obj, ns);
return Encoding.Unicode.GetString(ms.ToArray());
}
You are seeing BOM (byte order mask) as first character in your string converted from stream's byte array.
Turn off outputting BOM and you'll be fine.
Use encoding object that does not generate BOM: UnicodeEncoding
settings.Encoding = new UnicodeEncoding(bigEndian:false,byteOrderMark:true)

format the xml output

I am using this method to transform an object to XML:
protected XmlDocument SerializeAnObject(object obj)
{
XmlDocument doc = new XmlDocument();
DataContractSerializer serializer = new DataContractSerializer(obj.GetType());
MemoryStream stream = new MemoryStream();
try
{
serializer.WriteObject(stream, obj);
stream.Position = 0;
doc.Load(stream);
return doc;
}
finally
{
stream.Close();
stream.Dispose();
}
}
Eventually I get something like:
<CaCT>
<CTC i:nil="true" xmlns="http://schemas.datacontract.org/2004/07/a.b.BusinessEntities.InnerEntities" />
<CTDescr xmlns="http://schemas.datacontract.org/2004/07/a.b.BusinessEntities.InnerEntities">blabla</CTDescr>
<CaId>464</CaId>
</CaCT>
How can I get rid of the i:nil="true" and the xmlns="http://schemas.datacontract.org/2004/07/a.b.BusinessEntities.InnerEntities"?
Personally I've always found that hand-written XML serialization with LINQ to XML works well. It's as flexible as you want, you can make it backward and forward compatible in whatever way you want, and obviously you don't end up with any extra namespaces or attributes that you don't want.
Obviously it becomes more complicated the more complicated your classes are, but I've found it works very well for simple classes. It's at least an alternative to consider.
protected string SerializeAnObject(object obj)
{
XmlSerializerNamespaces xmlNamespaces = new XmlSerializerNamespaces();
xmlNamespaces.Add("", "");
XmlWriterSettings writerSettings = new XmlWriterSettings();
writerSettings.OmitXmlDeclaration = true;
XmlSerializer serializer = new XmlSerializer(obj.GetType());
using (MemoryStream ms = new MemoryStream())
{
using (XmlWriter stream = XmlWriter.Create(ms, writerSettings))
{
serializer.Serialize(stream, obj, xmlNamespaces);
return Encoding.UTF8.GetString(ms.ToArray());
}
}
}

remove encoding from xmlserializer

I am using the following code to create an xml document -
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add("", "");
new XmlSerializer(typeof(docket)).Serialize(Console.Out, i, ns);
this works great in creating the xml file with no namespace attributes. i would like to also have no encoding attribute in the root element, but I cannot find a way to do it. Does anyone have any idea if this can be done?
Thanks
Old answer removed and update with new solution:
Assuming that it's ok to remove the xml declaration completly, because it makes not much sense without the encoding attribute:
XmlSerializerNamespaces ns = new XmlSerializerNamespaces(); ns.Add("", "");
using (XmlWriter writer = XmlWriter.Create(Console.Out, new XmlWriterSettings { OmitXmlDeclaration = true}))
{
new XmlSerializer(typeof (SomeType)).Serialize(writer, new SomeType(), ns);
}
To remove encoding from XML header pass TextWriter with null encoding to XmlSerializer:
MemoryStream ms = new MemoryStream();
XmlTextWriter w = new XmlTextWriter(ms, null);
s.Serialize(w, vs);
Explanation
XmlTextWriter uses encoding from TextWriter passed in constructor:
// XmlTextWriter constructor
public XmlTextWriter(TextWriter w) : this()
{
this.textWriter = w;
this.encoding = w.Encoding;
..
It uses this encoding when generating XML:
// Snippet from XmlTextWriter.StartDocument
if (this.encoding != null)
{
builder.Append(" encoding=");
...
string withEncoding;
using (System.IO.MemoryStream memory = new System.IO.MemoryStream()) {
using (System.IO.StreamWriter writer = new System.IO.StreamWriter(memory)) {
serializer.Serialize(writer, obj, null);
using (System.IO.StreamReader reader = new System.IO.StreamReader(memory)) {
memory.Position = 0;
withEncoding= reader.ReadToEnd();
}
}
}
string withOutEncoding= withEncoding.Replace("<?xml version=\"1.0\" encoding=\"utf-8\"?>", "");
Credit to this blog for helping me with my code
http://blog.dotnetclr.com/archive/2008/01/29/removing-declaration-and-namespaces-from-xml-serialization.aspx
here's my solution, same idea, but in VB.NET and a little clearer in my opinion.
Dim sw As StreamWriter = New, StreamWriter(req.GetRequestStream,System.Text.Encoding.ASCII)
Dim xSerializer As XmlSerializer = New XmlSerializer(GetType(T))
Dim nmsp As XmlSerializerNamespaces = New XmlSerializerNamespaces()
nmsp.Add("", "")
Dim xWriterSettings As XmlWriterSettings = New XmlWriterSettings()
xWriterSettings.OmitXmlDeclaration = True
Dim xmlWriter As XmlWriter = xmlWriter.Create(sw, xWriterSettings)
xSerializer.Serialize(xmlWriter, someObjectT, nmsp)

XmlSerializer Producing XML With No Namespace Prefix

I have to create an XML file with all the elements prefixed, like this:
<ps:Request num="123" xmlns:ps="www.ladieda.com">
<ps:ClientId>5566</ps:ClientId>
<ps:Request>
When i serialize my object, c# is smart and does this:
<Request num="123" xmlns="www.ladieda.com">
<ClientId>5566</ClientId>
<Request>
That is good, because the ps: is not necessary.
But is there a way to force C# to serialize all the prefixes?
My serialize code is this (for incoming object pObject):
String XmlizedString = null;
MemoryStream memoryStream = new MemoryStream();
XmlSerializer xs = new XmlSerializer(pObject.GetType());
XmlTextWriter xmlTextWriter = new XmlTextWriter(memoryStream, Encoding.UTF8);
xs.Serialize(xmlTextWriter, pObject);
memoryStream = (MemoryStream)xmlTextWriter.BaseStream;
XmlizedString = UTF8ByteArrayToString(memoryStream.ToArray());
return XmlizedString;
private String UTF8ByteArrayToString(Byte[] characters)
{
UTF8Encoding encoding = new UTF8Encoding();
String constructedString = encoding.GetString(characters);
return (constructedString);
}
First of all, if the consumer of your string were processing XML, then they wouldn't care about the prefix, since it doesn't matter (to XML). Perhaps they don't understand XML, and think they're processing a string (which might need to have the string "ps:" on every element).
Second of all, you should change your code a bit:
XmlSerializer xs = new XmlSerializer(pObject.GetType());
using (MemoryStream memoryStream = new MemoryStream())
{
XmlWriterSettings settings = new XmlWriterSettings()
{
Encoding = Encoding.UTF8
};
using (XmlWriter writer = XmlWriter.Create(memoryStream, settings))
{
xs.Serialize(writer, pObject);
}
return Encoding.UTF8.GetString(memoryStream.ToArray());
}
This will properly dispose of the stream and XmlWriter if an exception is thrown, stops using the deprecated XmlTextWriter class, and yet still returns a string containing XML written for UTF-8.
Finally, to control the prefix, see "How to: Qualify XML Element and XML Attribute Names":
XmlSerializerNamespaces myNamespaces = new XmlSerializerNamespaces();
myNamespaces.Add("ps", "www.ladieda.com");
XmlSerializer xs = new XmlSerializer(pObject.GetType());
using (MemoryStream memoryStream = new MemoryStream())
{
XmlWriterSettings settings = new XmlWriterSettings()
{
Encoding = Encoding.UTF8
};
using (XmlWriter writer = XmlWriter.Create(memoryStream, settings))
{
xs.Serialize(writer, pObject, myNamespaces);
}
return Encoding.UTF8.GetString(memoryStream.ToArray());
}
Also check out XmlNamespaceDeclarationsAttribute. Caveat: when deserializing it will only give you namespaces defined by that element, it won't have namespaces defined in parent elements. If you don't have a consistent root type then use the XmlSerializer.Serialize() overload from #John Saunders.
http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlnamespacedeclarationsattribute.aspx
In another question #John Saunders suggests using this attribute in regards to controlling xmlns in WSDL: Namespace Prefixes in Wsdl (.net)
From MSDN Sample:
// C#
using System;
using System.IO;
using System.Xml.Serialization;
[XmlRoot("select")]
public class Select {
[XmlAttribute] public string xpath;
[XmlNamespaceDeclarations] public XmlSerializerNamespaces xmlns;
}
public class Test {
public static void Main(string[] args) {
Select mySelect = new Select();
mySelect.xpath = "myNS:ref/#common:y";
mySelect.xmlns = new XmlSerializerNamespaces();
mySelect.xmlns.Add("MyNS", "myNS.tempuri.org");
mySelect.xmlns.Add("common", "common.tempuri.org");
XmlSerializer ser = new XmlSerializer(typeof(Select));
ser.Serialize(Console.Out, mySelect);
}
}
// Output:
// <?xml version="1.0" encoding="IBM437"?>
// <select xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmln:xsi="http://www.w3.org/2001/XMLSchema-instance"
// xmlns:common="common.tempuri.org" xmlns:MyNS="myNS.tempuri.org" xpath="myNS:ref/#common:y" />

Categories