How to get XML with header (<?xml version="1.0"...)? - c#

Consider the following simple code which creates an XML document and displays it.
XmlDocument xml = new XmlDocument();
XmlElement root = xml.CreateElement("root");
xml.AppendChild(root);
XmlComment comment = xml.CreateComment("Comment");
root.AppendChild(comment);
textBox1.Text = xml.OuterXml;
it displays, as expected:
<root><!--Comment--></root>
It doesn't, however, display the
<?xml version="1.0" encoding="UTF-8"?>
So how can I get that as well?

Create an XML-declaration using XmlDocument.CreateXmlDeclaration Method:
XmlNode docNode = xml.CreateXmlDeclaration("1.0", "UTF-8", null);
xml.AppendChild(docNode);
Note: please take a look at the documentation for the method, especially for encoding parameter: there are special requirements for values of this parameter.

You need to use an XmlWriter (which writes the XML declaration by default). You should note that that C# strings are UTF-16 and your XML declaration says that the document is UTF-8 encoded. That discrepancy can cause problems. Here's an example, writing to a file that gives the result you expect:
XmlDocument xml = new XmlDocument();
XmlElement root = xml.CreateElement("root");
xml.AppendChild(root);
XmlComment comment = xml.CreateComment("Comment");
root.AppendChild(comment);
XmlWriterSettings settings = new XmlWriterSettings
{
Encoding = Encoding.UTF8,
ConformanceLevel = ConformanceLevel.Document,
OmitXmlDeclaration = false,
CloseOutput = true,
Indent = true,
IndentChars = " ",
NewLineHandling = NewLineHandling.Replace
};
using ( StreamWriter sw = File.CreateText("output.xml") )
using ( XmlWriter writer = XmlWriter.Create(sw,settings))
{
xml.WriteContentTo(writer);
writer.Close() ;
}
string document = File.ReadAllText( "output.xml") ;

XmlDeclaration xmldecl;
xmldecl = xmlDocument.CreateXmlDeclaration("1.0", "UTF-8", null);
XmlElement root = xmlDocument.DocumentElement;
xmlDocument.InsertBefore(xmldecl, root);

Related

Converting XML to UTF-8 using C#

I have written below code to convert XML file to UTF-8 format file, it is working as excepted but issue is header is concatenating with body text instead of writing in separate line. I need utf8 in seperate line but file.writealltext will not accept more than 3 arguments/parameters. Any help appreciated.
string path = #"samplefile.xml";
string path_new = #"samplefile_new.xml";
Encoding utf8 = new UTF8Encoding(false);
Encoding ansi = Encoding.GetEncoding(1252);
string xml = File.ReadAllText(path, ansi);
XDocument xmlDoc = XDocument.Parse(xml);
File.WriteAllText(
path_new,
#"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""true"">" + xmlDoc.ToString(),
utf8
);
No need to use any API other than LINQ to XML. It has all means to deal with XML file encoding, prolog, BOM, indentation, etc.
void Main()
{
string outputXMLfile = #"e:\temp\XMLfile_UTF-8.xml";
XDocument xml = XDocument.Parse(#"<?xml version='1.0' encoding='utf-16'?>
<root>
<row>some text</row>
</root>");
XDocument doc = new XDocument(
new XDeclaration("1.0", "utf-8", null),
new XElement(xml.Root)
);
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.IndentChars = "\t";
// to remove BOM
settings.Encoding = new UTF8Encoding(false);
using (XmlWriter writer = XmlWriter.Create(outputXMLfile, settings))
{
doc.Save(writer);
}
}

Add An XML Declaration To String Of XML

I have some xml data that looks like..
<Root>
<Data>Nack</Data>
<Data>Nelly</Data>
</Root>
I want to add "<?xml version=\"1.0\"?>" to this string. Then preserve the xml as a string.
I attempted a few things..
This breaks and loses the original xml string
myOriginalXml="<?xml version=\"1.0\"?>" + myOriginalXml;
This doesnt do anything, just keeps the original xml data with no declaration attached.
XmlDocument doc = new XmlDocument();
doc.LoadXml(myOriginalXml);
XmlDeclaration declaration = doc.CreateXmlDeclaration("1.0", "UTF-8","no");
StringWriter sw = new StringWriter();
XmlTextWriter tx = new XmlTextWriter(sw);
doc.WriteTo(tx);
string xmlString = sw.ToString();
This is also not seeming to have any effect..
XmlDocument doc = new XmlDocument();
doc.LoadXml(myOriginalXml);
XmlDeclaration declaration = doc.CreateXmlDeclaration("1.0", "UTF-8", "no");
MemoryStream xmlStream = new MemoryStream();
doc.Save(xmlStream);
xmlStream.Flush();
xmlStream.Position = 0;
doc.Load(xmlStream);
StringWriter sw = new StringWriter();
XmlTextWriter tx = new XmlTextWriter(sw);
doc.WriteTo(tx);
string xmlString = sw.ToString();
Use an xmlwritersettings to achieve greater control over saving. The XmlWriter.Create accepts that setting (instead of the default contructor)
var myOriginalXml = #"<Root>
<Data>Nack</Data>
<Data>Nelly</Data>
</Root>";
var doc = new XmlDocument();
doc.LoadXml(myOriginalXml);
var ms = new MemoryStream();
var tx = XmlWriter.Create(ms,
new XmlWriterSettings {
OmitXmlDeclaration = false,
ConformanceLevel= ConformanceLevel.Document,
Encoding = UTF8Encoding.UTF8 });
doc.Save(tx);
var xmlString = UTF8Encoding.UTF8.GetString(ms.ToArray());

XmllSerializer.Serialize and returning raw xml

I am using an XmlSerializer to serialize an object to xml. After the object gets serialized, I end up with something like...
<?xml version="1.0" encoding="utf-8"?>
<xml xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
</xml>
I need to have it return simply...
<xml></xml>
Is there a way to serialize xml without the extra information? I realize strictly proper xml requires these additional elements, but I need the simpler form as I am appending these xml strings together to form a larger xml blob.
UPDATE
I was able to get the following code to work...
public static string Serialize(object o)
{
XmlWriterSettings xws = new XmlWriterSettings();
xws.OmitXmlDeclaration = true;
xws.Indent = true;
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add(String.Empty, String.Empty);
StringBuilder sb = new StringBuilder();
XmlWriter xmlw = XmlWriter.Create(sb, xws);
XmlSerializer serializer = new XmlSerializer(o.GetType());
serializer.Serialize(xmlw, o, ns);
xmlw.Flush();
return sb.ToString();
}
In C#, you can do something like this:
XmlSerializer serializer = new XmlSerializer(typeof(object));
StringWriter stringWriter = new StringWriter();
using (XmlWriter writer = XmlWriter.Create(stringWriter, new XmlWriterSettings() { OmitXmlDeclaration = true }))
{
serializer.Serialize(writer, this, new XmlSerializerNamespaces() { "",""});
}
string xmlText = stringWriter.ToString();
Explanation:
OmitXmlDeclaration = true makes it remove the declaration.
new XmlSerializerNamespaces() { "",""} removes the namespaces.

Json.NET - convert JSON to XML and remove XML version, encoding?

http://james.newtonking.com/projects/json/help/
How come when I use "DeserializeXmlNode" and my JSON gets converted to an XML document
then convert my XML document into a string like this
string strXML = "";
StringWriter writer = new StringWriter();
xmlDoc.Save(writer);
strXML = writer.ToString();
It includes
<?xml version="1.0" encoding="utf-16"?>
I did not add this, how do I remove it?
an XML without that line is not a valid XML file!
that line is called the XML Declaration
as an example, check out the OData XML from Netflix on Catalog Titles, can you see that first line?
http://odata.netflix.com/Catalog/Titles
Use XmlWriter with StringBuilder instead of StringWriter
var strXML = "";
var writer = new StringBuilder();
var settings = new System.Xml.XmlWriterSettings() { OmitXmlDeclaration = true};
var xmlWriter = System.Xml.XmlWriter.Create(strXML, settings);
xmlDoc.Save(xmlWriter);
strXML = writer.ToString();

How can I remove the BOM from XmlTextWriter using C#?

How do remove the BOM from an XML file that is being created?
I have tried using the new UTF8Encoding(false) method, but it doesn't work. Here is the code I have:
XmlDocument xmlDoc = new XmlDocument();
XmlTextWriter xmlWriter = new XmlTextWriter(filename, new UTF8Encoding(false));
xmlWriter.Formatting = Formatting.Indented;
xmlWriter.WriteProcessingInstruction("xml", "version='1.0' encoding='UTF-8'");
xmlWriter.WriteStartElement("items");
xmlWriter.Close();
xmlDoc.Load(filename);
XmlNode root = xmlDoc.DocumentElement;
XmlElement item = xmlDoc.CreateElement("item");
root.AppendChild(item);
XmlElement itemCategory = xmlDoc.CreateElement("category");
XmlText itemCategoryText = xmlDoc.CreateTextNode("test");
item.AppendChild(itemCategory);
itemCategory.AppendChild(itemCategoryText);
xmlDoc.Save(filename);
You're saving the file twice - once with XmlTextWriter and once with xmlDoc.Save. Saving from the XmlTextWriter isn't adding a BOM - saving with xmlDoc.Save is.
Just save to a TextWriter instead, so that you can specify the encoding again:
using (TextWriter writer = new StreamWriter(filename, false,
new UTF8Encoding(false))
{
xmlDoc.Save(writer);
}
I'd write the XML to a string(builder) instead and then write that string to file.

Categories