Force XDocument to write to String with UTF-8 encoding - c#

I want to be able to write XML to a String with the declaration and with UTF-8 encoding. This seems mighty tricky to accomplish.
I have read around a bit and tried some of the popular answers for this but the they all have issues. My current code correctly outputs as UTF-8 but does not maintain the original formatting of the XDocument (i.e. indents / whitespace)!
Can anyone offer some advice please?
XDocument xml = new XDocument(new XDeclaration("1.0", "utf-8", "yes"), xelementXML);
MemoryStream ms = new MemoryStream();
using (XmlWriter xw = new XmlTextWriter(ms, Encoding.UTF8))
{
xml.Save(xw);
xw.Flush();
StreamReader sr = new StreamReader(ms);
ms.Seek(0, SeekOrigin.Begin);
String xmlString = sr.ReadToEnd();
}
The XML requires the formatting to be identical to the way .ToString() would format it i.e.
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<root>
<node>blah</node>
</root>
What I'm currently seeing is
<?xml version="1.0" encoding="utf-8" standalone="yes"?><root><node>blah</node></root>
Update
I have managed to get this to work by adding XmlTextWriter settings... It seems VERY clunky though!
MemoryStream ms = new MemoryStream();
XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = Encoding.UTF8;
settings.ConformanceLevel = ConformanceLevel.Document;
settings.Indent = true;
using (XmlWriter xw = XmlTextWriter.Create(ms, settings))
{
xml.Save(xw);
xw.Flush();
StreamReader sr = new StreamReader(ms);
ms.Seek(0, SeekOrigin.Begin);
String blah = sr.ReadToEnd();
}

Try this:
using System;
using System.IO;
using System.Text;
using System.Xml.Linq;
class Test
{
static void Main()
{
XDocument doc = XDocument.Load("test.xml",
LoadOptions.PreserveWhitespace);
doc.Declaration = new XDeclaration("1.0", "utf-8", null);
StringWriter writer = new Utf8StringWriter();
doc.Save(writer, SaveOptions.None);
Console.WriteLine(writer);
}
private class Utf8StringWriter : StringWriter
{
public override Encoding Encoding { get { return Encoding.UTF8; } }
}
}
Of course, you haven't shown us how you're building the document, which makes it hard to test... I've just tried with a hand-constructed XDocument and that contains the relevant whitespace too.

Try XmlWriterSettings:
XmlWriterSettings xws = new XmlWriterSettings();
xws.OmitXmlDeclaration = false;
xws.Indent = true;
And pass it on like
using (XmlWriter xw = XmlWriter.Create(sb, xws))

See also https://stackoverflow.com/a/3288376/1430535
return xdoc.Declaration.ToString() + Environment.NewLine + xdoc.ToString();

Related

Converting XML to UTF-8 using C#

I have written below code to convert XML file to UTF-8 format file, it is working as excepted but issue is header is concatenating with body text instead of writing in separate line. I need utf8 in seperate line but file.writealltext will not accept more than 3 arguments/parameters. Any help appreciated.
string path = #"samplefile.xml";
string path_new = #"samplefile_new.xml";
Encoding utf8 = new UTF8Encoding(false);
Encoding ansi = Encoding.GetEncoding(1252);
string xml = File.ReadAllText(path, ansi);
XDocument xmlDoc = XDocument.Parse(xml);
File.WriteAllText(
path_new,
#"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""true"">" + xmlDoc.ToString(),
utf8
);
No need to use any API other than LINQ to XML. It has all means to deal with XML file encoding, prolog, BOM, indentation, etc.
void Main()
{
string outputXMLfile = #"e:\temp\XMLfile_UTF-8.xml";
XDocument xml = XDocument.Parse(#"<?xml version='1.0' encoding='utf-16'?>
<root>
<row>some text</row>
</root>");
XDocument doc = new XDocument(
new XDeclaration("1.0", "utf-8", null),
new XElement(xml.Root)
);
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.IndentChars = "\t";
// to remove BOM
settings.Encoding = new UTF8Encoding(false);
using (XmlWriter writer = XmlWriter.Create(outputXMLfile, settings))
{
doc.Save(writer);
}
}

How to serialize an UTF 8 xml string

I had this code to create an xml. It used to work as expected, but in the first xml row, the encoding was set as utf-16.
The code was:
public static string SerializeObject<T>(this T toSerialize)
{
XmlSerializer xmlSerializer = new XmlSerializer(toSerialize.GetType());
using (StringWriter textWriter = new StringWriter())
{
xmlSerializer.Serialize(textWriter, toSerialize);
return textWriter.ToString();
}
}
Since I need to encode it in UTF-8, I edit it as follows:
public static string SerializeObject<T>(this T toSerialize)
{
XmlSerializer xmlSerializer = new XmlSerializer(toSerialize.GetType());
XmlWriterSettings settings = new XmlWriterSettings()
{
Encoding = new UTF8Encoding(),
Indent = false,
OmitXmlDeclaration = false
};
using (StringWriter textWriter = new StringWriter())
{
using (XmlWriter xmlWriter = XmlWriter.Create(textWriter, settings))
{
xmlSerializer.Serialize(xmlWriter, toSerialize);
}
return textWriter.ToString();
}
}
The code seems good, but the generated xml has still the row
<?xml version="1.0" encoding="utf-16"?>
and not
<?xml version="1.0" encoding="utf-8"?>
Why? How can I create an utf-8 xml?

XmllSerializer.Serialize and returning raw xml

I am using an XmlSerializer to serialize an object to xml. After the object gets serialized, I end up with something like...
<?xml version="1.0" encoding="utf-8"?>
<xml xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
</xml>
I need to have it return simply...
<xml></xml>
Is there a way to serialize xml without the extra information? I realize strictly proper xml requires these additional elements, but I need the simpler form as I am appending these xml strings together to form a larger xml blob.
UPDATE
I was able to get the following code to work...
public static string Serialize(object o)
{
XmlWriterSettings xws = new XmlWriterSettings();
xws.OmitXmlDeclaration = true;
xws.Indent = true;
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add(String.Empty, String.Empty);
StringBuilder sb = new StringBuilder();
XmlWriter xmlw = XmlWriter.Create(sb, xws);
XmlSerializer serializer = new XmlSerializer(o.GetType());
serializer.Serialize(xmlw, o, ns);
xmlw.Flush();
return sb.ToString();
}
In C#, you can do something like this:
XmlSerializer serializer = new XmlSerializer(typeof(object));
StringWriter stringWriter = new StringWriter();
using (XmlWriter writer = XmlWriter.Create(stringWriter, new XmlWriterSettings() { OmitXmlDeclaration = true }))
{
serializer.Serialize(writer, this, new XmlSerializerNamespaces() { "",""});
}
string xmlText = stringWriter.ToString();
Explanation:
OmitXmlDeclaration = true makes it remove the declaration.
new XmlSerializerNamespaces() { "",""} removes the namespaces.

remove encoding from xmlserializer

I am using the following code to create an xml document -
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add("", "");
new XmlSerializer(typeof(docket)).Serialize(Console.Out, i, ns);
this works great in creating the xml file with no namespace attributes. i would like to also have no encoding attribute in the root element, but I cannot find a way to do it. Does anyone have any idea if this can be done?
Thanks
Old answer removed and update with new solution:
Assuming that it's ok to remove the xml declaration completly, because it makes not much sense without the encoding attribute:
XmlSerializerNamespaces ns = new XmlSerializerNamespaces(); ns.Add("", "");
using (XmlWriter writer = XmlWriter.Create(Console.Out, new XmlWriterSettings { OmitXmlDeclaration = true}))
{
new XmlSerializer(typeof (SomeType)).Serialize(writer, new SomeType(), ns);
}
To remove encoding from XML header pass TextWriter with null encoding to XmlSerializer:
MemoryStream ms = new MemoryStream();
XmlTextWriter w = new XmlTextWriter(ms, null);
s.Serialize(w, vs);
Explanation
XmlTextWriter uses encoding from TextWriter passed in constructor:
// XmlTextWriter constructor
public XmlTextWriter(TextWriter w) : this()
{
this.textWriter = w;
this.encoding = w.Encoding;
..
It uses this encoding when generating XML:
// Snippet from XmlTextWriter.StartDocument
if (this.encoding != null)
{
builder.Append(" encoding=");
...
string withEncoding;
using (System.IO.MemoryStream memory = new System.IO.MemoryStream()) {
using (System.IO.StreamWriter writer = new System.IO.StreamWriter(memory)) {
serializer.Serialize(writer, obj, null);
using (System.IO.StreamReader reader = new System.IO.StreamReader(memory)) {
memory.Position = 0;
withEncoding= reader.ReadToEnd();
}
}
}
string withOutEncoding= withEncoding.Replace("<?xml version=\"1.0\" encoding=\"utf-8\"?>", "");
Credit to this blog for helping me with my code
http://blog.dotnetclr.com/archive/2008/01/29/removing-declaration-and-namespaces-from-xml-serialization.aspx
here's my solution, same idea, but in VB.NET and a little clearer in my opinion.
Dim sw As StreamWriter = New, StreamWriter(req.GetRequestStream,System.Text.Encoding.ASCII)
Dim xSerializer As XmlSerializer = New XmlSerializer(GetType(T))
Dim nmsp As XmlSerializerNamespaces = New XmlSerializerNamespaces()
nmsp.Add("", "")
Dim xWriterSettings As XmlWriterSettings = New XmlWriterSettings()
xWriterSettings.OmitXmlDeclaration = True
Dim xmlWriter As XmlWriter = xmlWriter.Create(sw, xWriterSettings)
xSerializer.Serialize(xmlWriter, someObjectT, nmsp)

XmlSerializer Producing XML With No Namespace Prefix

I have to create an XML file with all the elements prefixed, like this:
<ps:Request num="123" xmlns:ps="www.ladieda.com">
<ps:ClientId>5566</ps:ClientId>
<ps:Request>
When i serialize my object, c# is smart and does this:
<Request num="123" xmlns="www.ladieda.com">
<ClientId>5566</ClientId>
<Request>
That is good, because the ps: is not necessary.
But is there a way to force C# to serialize all the prefixes?
My serialize code is this (for incoming object pObject):
String XmlizedString = null;
MemoryStream memoryStream = new MemoryStream();
XmlSerializer xs = new XmlSerializer(pObject.GetType());
XmlTextWriter xmlTextWriter = new XmlTextWriter(memoryStream, Encoding.UTF8);
xs.Serialize(xmlTextWriter, pObject);
memoryStream = (MemoryStream)xmlTextWriter.BaseStream;
XmlizedString = UTF8ByteArrayToString(memoryStream.ToArray());
return XmlizedString;
private String UTF8ByteArrayToString(Byte[] characters)
{
UTF8Encoding encoding = new UTF8Encoding();
String constructedString = encoding.GetString(characters);
return (constructedString);
}
First of all, if the consumer of your string were processing XML, then they wouldn't care about the prefix, since it doesn't matter (to XML). Perhaps they don't understand XML, and think they're processing a string (which might need to have the string "ps:" on every element).
Second of all, you should change your code a bit:
XmlSerializer xs = new XmlSerializer(pObject.GetType());
using (MemoryStream memoryStream = new MemoryStream())
{
XmlWriterSettings settings = new XmlWriterSettings()
{
Encoding = Encoding.UTF8
};
using (XmlWriter writer = XmlWriter.Create(memoryStream, settings))
{
xs.Serialize(writer, pObject);
}
return Encoding.UTF8.GetString(memoryStream.ToArray());
}
This will properly dispose of the stream and XmlWriter if an exception is thrown, stops using the deprecated XmlTextWriter class, and yet still returns a string containing XML written for UTF-8.
Finally, to control the prefix, see "How to: Qualify XML Element and XML Attribute Names":
XmlSerializerNamespaces myNamespaces = new XmlSerializerNamespaces();
myNamespaces.Add("ps", "www.ladieda.com");
XmlSerializer xs = new XmlSerializer(pObject.GetType());
using (MemoryStream memoryStream = new MemoryStream())
{
XmlWriterSettings settings = new XmlWriterSettings()
{
Encoding = Encoding.UTF8
};
using (XmlWriter writer = XmlWriter.Create(memoryStream, settings))
{
xs.Serialize(writer, pObject, myNamespaces);
}
return Encoding.UTF8.GetString(memoryStream.ToArray());
}
Also check out XmlNamespaceDeclarationsAttribute. Caveat: when deserializing it will only give you namespaces defined by that element, it won't have namespaces defined in parent elements. If you don't have a consistent root type then use the XmlSerializer.Serialize() overload from #John Saunders.
http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlnamespacedeclarationsattribute.aspx
In another question #John Saunders suggests using this attribute in regards to controlling xmlns in WSDL: Namespace Prefixes in Wsdl (.net)
From MSDN Sample:
// C#
using System;
using System.IO;
using System.Xml.Serialization;
[XmlRoot("select")]
public class Select {
[XmlAttribute] public string xpath;
[XmlNamespaceDeclarations] public XmlSerializerNamespaces xmlns;
}
public class Test {
public static void Main(string[] args) {
Select mySelect = new Select();
mySelect.xpath = "myNS:ref/#common:y";
mySelect.xmlns = new XmlSerializerNamespaces();
mySelect.xmlns.Add("MyNS", "myNS.tempuri.org");
mySelect.xmlns.Add("common", "common.tempuri.org");
XmlSerializer ser = new XmlSerializer(typeof(Select));
ser.Serialize(Console.Out, mySelect);
}
}
// Output:
// <?xml version="1.0" encoding="IBM437"?>
// <select xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmln:xsi="http://www.w3.org/2001/XMLSchema-instance"
// xmlns:common="common.tempuri.org" xmlns:MyNS="myNS.tempuri.org" xpath="myNS:ref/#common:y" />

Categories