I have an XmlDocument and I am saving it with an XmlWriter, using this post. Despite setting the Encoding to Utf-8 and the file getting saved with Utf-8 encoding in fact, the xml declaration in the file has the "utf-16" as the value of the encoding attribute.
I can't see where is the error in my code:
StringBuilder sb = new StringBuilder();
XmlWriterSettings settings = new XmlWriterSettings
{
Encoding=Encoding.UTF8
};
using (XmlWriter writer = XmlWriter.Create(sb, settings))
{
xDoc.Save(writer);
}
using (
StreamWriter sw = new StreamWriter(
new FileStream(strXmlName, FileMode.Create, FileAccess.Write),
Encoding.UTF8
)
)
{
sw.Write(sb.ToString());
}
The reason for this is covered in the question #dbc links to in the comments: The overload of XmlWriter.Create that accepts a StringBuilder will create a StringWriter, which has its encoding set to UTF-16.
However, in this case it's not clear why you're using a StringBuilder when your goal is to write to a file. You could create an XmlWriter for the file directly:
var settings = new XmlWriterSettings
{
Indent = true
};
using (var writer = XmlWriter.Create(strXmlName, settings))
{
xDoc.WriteTo(writer);
}
The encoding here will default to UTF-8.
As an aside, I'd suggest you check out the much newer XDocument and friends, it's a much more friendly API than XmlDocument.
Related
I'm currently trying to serialize a class into XML to be posted to php web service.
Whenever I did the normal serialization using XMLSerializer, XML declaration is always appear in the first line of the XML document (similar as to <?xml ....?>). I tested the XML and unable to get it working because the endpoint does not accept XML declaration and I can't do anything about it.
I'm unfamiliar with XML Serialization in C# to be honest.
Therefore, I used XMLWriter to do this as below :-
private string SerializeClassToString(GetRiskReport value)
{
var emptyNS = new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty });
var ser = new XmlSerializer(value.GetType());
var settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = true;
using (var stream = new StringWriter())
{
using (var writer = XmlWriter.Create(stream, settings))
{
ser.Serialize(writer, value, emptyNS);
return stream.ToString();
}
}
}
Result for the Namespace is
<GetRiskReport FCRA=\"false\" ReturnResultsOnly=\"false\" Monitoring=\"false\">
... and I'm able to omit the XML Declaration, however I'm being introduced with 2 new problem.
I got \r\n for new line and I have escaped double quote such as ReturnResultsOnly=\"false\" Monitoring=\"false\" which is also unable processed by the endpoint.
I would like to ask is that does anyone can give me an idea on how to change the XmlWriterSetting to omit XML Declaration, avoid \r\n and also avoid escaped double quotes \"
Thanks for your advice in advance.
Simon
Try with following settings
settings.NewLineHandling = NewLineHandling.None;
settings.CheckCharacters = false;
private void SerializeClassToString(GetRiskReport value)
{
var emptyNS = new XmlSerializerNamespaces(new[]{XmlQualifiedName.Empty});
var ser = new XmlSerializer(value.GetType());
var settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = true;
string path = 'your_file_path_here'
if (File.Exists(path)) File.Delete(path);
FileStream stream = File.Create(path);
using (var writer = XmlWriter.Create(stream, settings))
{
ser.Serialize(writer, value, emptyNS);
return;
}
}
There was no way to avoid ms bug or thier intensional specification about xmlserializing.It's easier and faster to use filestream object.
I'm trying to save an xml file into a blob. I have no error, everything seems fine except when I navigate to the blob url I see a blank page. If I look at the source code of the web page I can see my xml but truncated.
Here is te code.
StringBuilder fileString = new StringBuilder();
XmlWriterSettings xmlSettings=new XmlWriterSettings
{
Encoding = new UTF8Encoding(false)
};
using (XmlWriter writer = XmlWriter.Create(fileString, xmlSettings))
{
bla bla
}
CloudBlockBlob fileBlob = container.GetBlockBlobReference("site.xml");
fileBlob.UploadText(fileString.ToString());
I found the solution in some other post (not so much the problem although it has to do with encoding of text always being utf-16 despite setting up the writer as utf-8). I am now using a Stream and it works fine.
MemoryStream fileString = new MemoryStream();
XmlWriterSettings xmlSettings=new XmlWriterSettings
{
Encoding = Encoding.UTF8,
Indent = true
};
using (XmlWriter writer = XmlWriter.Create(fileString, xmlSettings))
{
bla bla
}
CloudBlockBlob fileBlob = container.GetBlockBlobReference("site.xml");
fileBlob.UploadText(StreamToString(fileString));
private static string StreamToString(Stream stream)
{
stream.Position = 0;
var reader = new StreamReader(stream);
return reader.ReadToEnd();
}
If you have decided to use a Stream, you can also upload it using UploadFromStream instead of UploadText, which encodes the string into a sequence of bytes, creates a memory stream, and calls UploadFromStream anyway.
I am new to LINQ to XML. After you have built XDocument, how do you get the OuterXml of it like you did with XmlDocument?
You only need to use the overridden ToString() method of the object:
XDocument xmlDoc ...
string xml = xmlDoc.ToString();
This works with all XObjects, like XElement, etc.
I don't know when this changed, but today (July 2017) when trying the answers out, I got
"System.Xml.XmlDocument"
Instead of ToString(), you can use the originally intended way accessing the XmlDocument content: writing the xml doc to a stream.
XmlDocument xml = ...;
string result;
using (StringWriter writer = new StringWriter())
{
xml.Save(writer);
result = writer.ToString();
}
Several responses give a slightly incorrect answer.
XDocument.ToString() omits the XML declaration (and, according to #Alex Gordon, may return invalid XML if it contains encoded unusual characters like &).
Saving XDocument to StringWriter will cause .NET to emit encoding="utf-16", which you most likely don't want (if you save XML as a string, it's probably because you want to later save it as a file, and de facto standard for saving files is UTF-8 - .NET saves text files as UTF-8 unless specified otherwise).
#Wolfgang Grinfeld's answer is heading in the right direction, but it's unnecessarily complex.
Use the following:
var memory = new MemoryStream();
xDocument.Save(memory);
string xmlText = Encoding.UTF8.GetString(memory.ToArray());
This will return XML text with UTF-8 declaration.
Doing XDocument.ToString() may not get you the full XML.
In order to get the XML declaration at the start of the XML document as a string, use the XDocument.Save() method:
var ms = new MemoryStream();
using (var xw = XmlWriter.Create(new StreamWriter(ms, Encoding.GetEncoding("ISO-8859-1"))))
new XDocument(new XElement("Root", new XElement("Leaf", "data"))).Save(xw);
var myXml = Encoding.GetEncoding("ISO-8859-1").GetString(ms.ToArray());
Use ToString() to convert XDocument into a string:
string result = string.Empty;
XElement root = new XElement("xml",
new XElement("MsgType", "<![CDATA[" + "text" + "]]>"),
new XElement("Content", "<![CDATA[" + "Hi, this is Wilson Wu Testing for you! You can ask any question but no answer can be replied...." + "]]>"),
new XElement("FuncFlag", 0)
);
result = root.ToString();
While #wolfgang-grinfeld's answer is technically correct (as it also produces the XML declaration, as opposed to just using .ToString() method), the code generated UTF-8 byte order mark (BOM), which for some reason XDocument.Parse(string) method cannot process and throws Data at the root level is invalid. Line 1, position 1. error.
So here is a another solution without the BOM:
var utf8Encoding =
new UTF8Encoding(encoderShouldEmitUTF8Identifier: false);
using (var memory = new MemoryStream())
using (var writer = XmlWriter.Create(memory, new XmlWriterSettings
{
OmitXmlDeclaration = false,
Encoding = utf8Encoding
}))
{
CompanyDataXml.Save(writer);
writer.Flush();
return utf8Encoding.GetString(memory.ToArray());
}
I found this example in the Microsoft .NET 6 documentation for XDocument.Save method. I think it answers the original question (what is the XDocument equivalent for XmlDocument.OuterXml), and also addresses the concerns that others have pointed out already. By using the XmlWritingSettings you can predictably control the string output.
https://learn.microsoft.com/en-us/dotnet/api/system.xml.linq.xdocument.save
StringBuilder sb = new StringBuilder();
XmlWriterSettings xws = new XmlWriterSettings();
xws.OmitXmlDeclaration = true;
xws.Indent = true;
using (XmlWriter xw = XmlWriter.Create(sb, xws)) {
XDocument doc = new XDocument(
new XElement("Child",
new XElement("GrandChild", "some content")
)
);
doc.Save(xw);
}
Console.WriteLine(sb.ToString());
Looking at these answers, I see a lot of unnecessary complexity and inefficiency in pursuit of generating the XML declaration automatically. But since the declaration is so simple, there isn't much value in generating it. Just KISS (keep it simple, stupid):
// Extension method
public static string ToStringWithDeclaration(this XDocument doc, string declaration = null)
{
declaration ??= "<?xml version=\"1.0\" encoding=\"utf-8\"?>\r\n";
return declaration + doc.ToString();
}
// Usage
string xmlString = doc.ToStringWithDeclaration();
// Or
string xmlString = doc.ToStringWithDeclaration("...");
Using XmlWriter instead of ToString() can give you more control over how the output is formatted (such as if you want indentation), and it can write to other targets besides string.
The reason to target a memory stream is performance. It lets you skip the step of storing the XML in a string (since you know the data must end up in a different encoding eventually, whereas string is always UTF-16 in C#). For instance, for an HTTP request:
// Extension method
public static ByteArrayContent ToByteArrayContent(
this XDocument doc, XmlWriterSettings xmlWriterSettings = null)
{
xmlWriterSettings ??= new XmlWriterSettings();
using (var stream = new MemoryStream())
{
using (var writer = XmlWriter.Create(stream, xmlWriterSettings))
{
doc.Save(writer);
}
var content = new ByteArrayContent(stream.GetBuffer(), 0, (int)stream.Length);
content.Headers.ContentType = new MediaTypeHeaderValue("text/xml");
return content;
}
}
// Usage (XDocument -> UTF-8 bytes)
var content = doc.ToByteArrayContent();
var response = await httpClient.PostAsync("/someurl", content);
// Alternative (XDocument -> string -> UTF-8 bytes)
var content = new StringContent(doc.ToStringWithDeclaration(), Encoding.UTF8, "text/xml");
var response = await httpClient.PostAsync("/someurl", content);
I am trying to apply a XSL style sheet on a source xml and write the output to a target xml file. The xsl removes the xml comments present inside the source xml.
The target xml file has UTF-16 encoding in the header.
But still i want the output xml to be utf-8 encoding. The code i used is
XmlWriterSettings xwrSettings = new XmlWriterSettings();
**xwrSettings.Encoding = Encoding.UTF8;**
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load("sample.xsl");
StringBuilder sb = new StringBuilder();
XmlReader xReader = XmlReader.Create("Source.xml");
XmlWriter xWriter = XmlWriter.Create(sb, xwrSettings);
xslt.Transform(xReader, xWriter);
File.WriteAllText("Target.xml",sb.ToString());
I tried to set the xml writer setting to be of UTF-8 but it is not working.
Since you are writing to file, why not just use:
using (XmlReader xReader = XmlReader.Create("Source.xml"))
using (XmlWriter xWriter = XmlWriter.Create("Target.xml", xwrSettings)) {
xslt.Transform(xReader, xWriter);
}
// your file is now written
I've got a function creating some XmlDocument:
public string CreateOutputXmlString(ICollection<Field> fields)
{
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.Encoding = Encoding.GetEncoding("windows-1250");
StringBuilder builder = new StringBuilder();
XmlWriter writer = XmlWriter.Create(builder, settings);
writer.WriteStartDocument();
writer.WriteStartElement("data");
foreach (Field field in fields)
{
writer.WriteStartElement("item");
writer.WriteAttributeString("name", field.Id);
writer.WriteAttributeString("value", field.Value);
writer.WriteEndElement();
}
writer.WriteEndElement();
writer.Flush();
writer.Close();
return builder.ToString();
}
I set an encoding but after i create XmlWriter it does have utf-16 encoding. I know it's because strings (and StringBuilder i suppose) are encoded in utf-16 and you can't change it.
So how can I easily create this xml with the encoding attribute set to "windows-1250"? it doesn't even have to be encoded in this encoding, it just has to have the specified attribute.
edit: it has to be in .Net 2.0 so any new framework elements cannot be used.
You need to use a StringWriter with the appropriate encoding. Unfortunately StringWriter doesn't let you specify the encoding directly, so you need a class like this:
public sealed class StringWriterWithEncoding : StringWriter
{
private readonly Encoding encoding;
public StringWriterWithEncoding (Encoding encoding)
{
this.encoding = encoding;
}
public override Encoding Encoding
{
get { return encoding; }
}
}
(This question is similar but not quite a duplicate.)
EDIT: To answer the comment: pass the StringWriterWithEncoding to XmlWriter.Create instead of the StringBuilder, then call ToString() on it at the end.
Just some extra explanations to why this is so.
Strings are sequences of characters, not bytes. Strings, per se, are not "encoded", because they are using characters, which are stored as Unicode codepoints. Encoding DOES NOT MAKE SENSE at String level.
An encoding is a mapping from a sequence of codepoints (characters) to a sequence of bytes (for storage on byte-based systems like filesystems or memory). The framework does not let you specify encodings, unless there is a compelling reason to, like to make 16-bit codepoints fit on byte-based storage.
So when you're trying to write your XML into a StringBuilder, you're actually building an XML sequence of characters and writing them as a sequence of characters, so no encoding is performed. Therefore, no Encoding field.
If you want to use an encoding, the XmlWriter has to write to a Stream.
About the solution that you found with the MemoryStream, no offense intended, but it's just flapping around arms and moving hot air. You're encoding your codepoints with 'windows-1252', and then parsing it back to codepoints. The only change that may occur is that characters not defined in windows-1252 get converted to a '?' character in the process.
To me, the right solution might be the following one. Depending on what your function is used for, you could pass a Stream as a parameter to your function, so that the caller decides whether it should be written to memory or to a file. So it would be written like this:
public static void WriteFieldsAsXmlDocument(ICollection fields, Stream outStream)
{
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.Encoding = Encoding.GetEncoding("windows-1250");
using(XmlWriter writer = XmlWriter.Create(outStream, settings)) {
writer.WriteStartDocument();
writer.WriteStartElement("data");
foreach (Field field in fields)
{
writer.WriteStartElement("item");
writer.WriteAttributeString("name", field.Id);
writer.WriteAttributeString("value", field.Value);
writer.WriteEndElement();
}
writer.WriteEndElement();
}
}
MemoryStream memoryStream = new MemoryStream();
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings();
xmlWriterSettings.Encoding = Encoding.UTF8;
XmlWriter xmlWriter = XmlWriter.Create(memoryStream, xmlWriterSettings);
xmlWriter.WriteStartDocument();
xmlWriter.WriteStartElement("root", "http://www.timvw.be/ns");
xmlWriter.WriteEndElement();
xmlWriter.WriteEndDocument();
xmlWriter.Flush();
xmlWriter.Close();
string xmlString = Encoding.UTF8.GetString(memoryStream.ToArray());
From here
I actually solved the problem with MemoryStream:
public static string CreateOutputXmlString(ICollection<Field> fields)
{
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.Encoding = Encoding.GetEncoding("windows-1250");
MemoryStream memStream = new MemoryStream();
XmlWriter writer = XmlWriter.Create(memStream, settings);
writer.WriteStartDocument();
writer.WriteStartElement("data");
foreach (Field field in fields)
{
writer.WriteStartElement("item");
writer.WriteAttributeString("name", field.Id);
writer.WriteAttributeString("value", field.Value);
writer.WriteEndElement();
}
writer.WriteEndElement();
writer.Flush();
writer.Close();
writer.Flush();
writer.Close();
string xml = Encoding.GetEncoding("windows-1250").GetString(memStream.ToArray());
memStream.Close();
memStream.Dispose();
return xml;
}
I solved mine by outputting the string to a variable then replacing any references to utf-16 with utf-8 (my app needed UTF8 encoding). Since you're using a function, you could do something similar. I use VB.net mostly, but I think the C# would look something like this.
return builder.ToString().Replace("utf-16", "utf-8");