Converting XML to UTF-8 using C# - c#

I have written below code to convert XML file to UTF-8 format file, it is working as excepted but issue is header is concatenating with body text instead of writing in separate line. I need utf8 in seperate line but file.writealltext will not accept more than 3 arguments/parameters. Any help appreciated.
string path = #"samplefile.xml";
string path_new = #"samplefile_new.xml";
Encoding utf8 = new UTF8Encoding(false);
Encoding ansi = Encoding.GetEncoding(1252);
string xml = File.ReadAllText(path, ansi);
XDocument xmlDoc = XDocument.Parse(xml);
File.WriteAllText(
path_new,
#"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""true"">" + xmlDoc.ToString(),
utf8
);

No need to use any API other than LINQ to XML. It has all means to deal with XML file encoding, prolog, BOM, indentation, etc.
void Main()
{
string outputXMLfile = #"e:\temp\XMLfile_UTF-8.xml";
XDocument xml = XDocument.Parse(#"<?xml version='1.0' encoding='utf-16'?>
<root>
<row>some text</row>
</root>");
XDocument doc = new XDocument(
new XDeclaration("1.0", "utf-8", null),
new XElement(xml.Root)
);
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.IndentChars = "\t";
// to remove BOM
settings.Encoding = new UTF8Encoding(false);
using (XmlWriter writer = XmlWriter.Create(outputXMLfile, settings))
{
doc.Save(writer);
}
}

Related

Change xslCompiledTransform output encoding while transformig

I am Converting my XML to XSLT by using xslCompiledTransform and when it transforms it changes the output result encoding which is UTF-16 (by default)
and when I try to change its or encoding it prompts the error that this property is read-only you can't change it!
I also try xmlWriter and xmlWriterSettings and memory Stream and other solutions but nothing works for me and for the reference I am adding code snippet
public static StringBuilder TransformXml(ProcessorConfigElement configSettings, StringBuilder xml, ILog logger)
{
//Perform transformation...
StringBuilder newXmlBuilder = new StringBuilder(xml.Length);
XslCompiledTransform requiredXslt = new XslCompiledTransform();
requiredXslt.Load(configSettings.XsltPath, XsltSettings.TrustedXslt, new XmlUrlResolver());
// I tried this trick also but all in vain
// Encoding wind1252 = Encoding.GetEncoding(1252);
// XmlWriterSettings xmlSettings = new XmlWriterSettings();
// xmlSettings.Encoding = wind1252;
// xmlSettings.ConformanceLevel = ConformanceLevel.Fragment;
// xmlSettings.OmitXmlDeclaration = true;
// XmlWriter writer = XmlWriter.Create(newXmlBuilder, xmlSettings);
StringBuilder s = new StringBuilder();
using (TextWriter newXmlWriter = StringWriter.Create(newXmlBuilder))
{
if (!string.IsNullOrEmpty(configSettings.Delimiter))
{
XsltArgumentList argsList = new XsltArgumentList();
argsList.AddParam("delimiter", "", configSettings.Delimiter);
// here is the actual problem when it transforms its create a mess and converting pound symbol and other symbols as diamond special character (encoding issue.)
requiredXslt.Transform(GetElement(xml.ToString()), argsList, newXmlWriter);
}
else
{
requiredXslt.Transform(GetElement(xml.ToString()), null, newXmlWriter);
}
}
logger.Info("XSLT applied successfully");
//replace string after transformation to validate and write to file
xml = newXmlBuilder;
return xml;
}
I want to use the desired UTF encoding while transforming it to XSLT, anyone?
As Martin Honnen already pointed out, if XSLT already has output declaration along the following line:
XSLT
<xsl:output indent="yes" method="xml" encoding="utf-8"/>
Here is c# that picks it up from the XSLT file via xslt.OutputSettings parameter:
c#
void Main()
{
const string SOURCEXMLFILE = #"e:\Temp\UniversalShipment.xml";
const string stylesheet = #"e:\Temp\UniversalShipment.xslt";
const string OUTPUTXMLFILE = #"e:\temp\UniversalShipment_output.xml";
bool paramXSLT = false;
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(stylesheet, XsltSettings.TrustedXslt, new XmlUrlResolver());
// Load the file to transform.
XPathDocument doc = new XPathDocument(SOURCEXMLFILE);
XsltArgumentList xslArg = new XsltArgumentList();
if (paramXSLT)
{
// Create a parameter which represents the current date and time.
DateTime d = DateTime.Now;
xslArg.AddParam("date", "", d.ToString());
}
using (XmlWriter writer = XmlWriter.Create(OUTPUTXMLFILE, xslt.OutputSettings))
{
xslt.Transform(doc, xslArg, writer);
}
}

How to get XML with header (<?xml version="1.0"...)?

Consider the following simple code which creates an XML document and displays it.
XmlDocument xml = new XmlDocument();
XmlElement root = xml.CreateElement("root");
xml.AppendChild(root);
XmlComment comment = xml.CreateComment("Comment");
root.AppendChild(comment);
textBox1.Text = xml.OuterXml;
it displays, as expected:
<root><!--Comment--></root>
It doesn't, however, display the
<?xml version="1.0" encoding="UTF-8"?>
So how can I get that as well?
Create an XML-declaration using XmlDocument.CreateXmlDeclaration Method:
XmlNode docNode = xml.CreateXmlDeclaration("1.0", "UTF-8", null);
xml.AppendChild(docNode);
Note: please take a look at the documentation for the method, especially for encoding parameter: there are special requirements for values of this parameter.
You need to use an XmlWriter (which writes the XML declaration by default). You should note that that C# strings are UTF-16 and your XML declaration says that the document is UTF-8 encoded. That discrepancy can cause problems. Here's an example, writing to a file that gives the result you expect:
XmlDocument xml = new XmlDocument();
XmlElement root = xml.CreateElement("root");
xml.AppendChild(root);
XmlComment comment = xml.CreateComment("Comment");
root.AppendChild(comment);
XmlWriterSettings settings = new XmlWriterSettings
{
Encoding = Encoding.UTF8,
ConformanceLevel = ConformanceLevel.Document,
OmitXmlDeclaration = false,
CloseOutput = true,
Indent = true,
IndentChars = " ",
NewLineHandling = NewLineHandling.Replace
};
using ( StreamWriter sw = File.CreateText("output.xml") )
using ( XmlWriter writer = XmlWriter.Create(sw,settings))
{
xml.WriteContentTo(writer);
writer.Close() ;
}
string document = File.ReadAllText( "output.xml") ;
XmlDeclaration xmldecl;
xmldecl = xmlDocument.CreateXmlDeclaration("1.0", "UTF-8", null);
XmlElement root = xmlDocument.DocumentElement;
xmlDocument.InsertBefore(xmldecl, root);

Add An XML Declaration To String Of XML

I have some xml data that looks like..
<Root>
<Data>Nack</Data>
<Data>Nelly</Data>
</Root>
I want to add "<?xml version=\"1.0\"?>" to this string. Then preserve the xml as a string.
I attempted a few things..
This breaks and loses the original xml string
myOriginalXml="<?xml version=\"1.0\"?>" + myOriginalXml;
This doesnt do anything, just keeps the original xml data with no declaration attached.
XmlDocument doc = new XmlDocument();
doc.LoadXml(myOriginalXml);
XmlDeclaration declaration = doc.CreateXmlDeclaration("1.0", "UTF-8","no");
StringWriter sw = new StringWriter();
XmlTextWriter tx = new XmlTextWriter(sw);
doc.WriteTo(tx);
string xmlString = sw.ToString();
This is also not seeming to have any effect..
XmlDocument doc = new XmlDocument();
doc.LoadXml(myOriginalXml);
XmlDeclaration declaration = doc.CreateXmlDeclaration("1.0", "UTF-8", "no");
MemoryStream xmlStream = new MemoryStream();
doc.Save(xmlStream);
xmlStream.Flush();
xmlStream.Position = 0;
doc.Load(xmlStream);
StringWriter sw = new StringWriter();
XmlTextWriter tx = new XmlTextWriter(sw);
doc.WriteTo(tx);
string xmlString = sw.ToString();
Use an xmlwritersettings to achieve greater control over saving. The XmlWriter.Create accepts that setting (instead of the default contructor)
var myOriginalXml = #"<Root>
<Data>Nack</Data>
<Data>Nelly</Data>
</Root>";
var doc = new XmlDocument();
doc.LoadXml(myOriginalXml);
var ms = new MemoryStream();
var tx = XmlWriter.Create(ms,
new XmlWriterSettings {
OmitXmlDeclaration = false,
ConformanceLevel= ConformanceLevel.Document,
Encoding = UTF8Encoding.UTF8 });
doc.Save(tx);
var xmlString = UTF8Encoding.UTF8.GetString(ms.ToArray());

Json.NET - convert JSON to XML and remove XML version, encoding?

http://james.newtonking.com/projects/json/help/
How come when I use "DeserializeXmlNode" and my JSON gets converted to an XML document
then convert my XML document into a string like this
string strXML = "";
StringWriter writer = new StringWriter();
xmlDoc.Save(writer);
strXML = writer.ToString();
It includes
<?xml version="1.0" encoding="utf-16"?>
I did not add this, how do I remove it?
an XML without that line is not a valid XML file!
that line is called the XML Declaration
as an example, check out the OData XML from Netflix on Catalog Titles, can you see that first line?
http://odata.netflix.com/Catalog/Titles
Use XmlWriter with StringBuilder instead of StringWriter
var strXML = "";
var writer = new StringBuilder();
var settings = new System.Xml.XmlWriterSettings() { OmitXmlDeclaration = true};
var xmlWriter = System.Xml.XmlWriter.Create(strXML, settings);
xmlDoc.Save(xmlWriter);
strXML = writer.ToString();

Force XDocument to write to String with UTF-8 encoding

I want to be able to write XML to a String with the declaration and with UTF-8 encoding. This seems mighty tricky to accomplish.
I have read around a bit and tried some of the popular answers for this but the they all have issues. My current code correctly outputs as UTF-8 but does not maintain the original formatting of the XDocument (i.e. indents / whitespace)!
Can anyone offer some advice please?
XDocument xml = new XDocument(new XDeclaration("1.0", "utf-8", "yes"), xelementXML);
MemoryStream ms = new MemoryStream();
using (XmlWriter xw = new XmlTextWriter(ms, Encoding.UTF8))
{
xml.Save(xw);
xw.Flush();
StreamReader sr = new StreamReader(ms);
ms.Seek(0, SeekOrigin.Begin);
String xmlString = sr.ReadToEnd();
}
The XML requires the formatting to be identical to the way .ToString() would format it i.e.
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<root>
<node>blah</node>
</root>
What I'm currently seeing is
<?xml version="1.0" encoding="utf-8" standalone="yes"?><root><node>blah</node></root>
Update
I have managed to get this to work by adding XmlTextWriter settings... It seems VERY clunky though!
MemoryStream ms = new MemoryStream();
XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = Encoding.UTF8;
settings.ConformanceLevel = ConformanceLevel.Document;
settings.Indent = true;
using (XmlWriter xw = XmlTextWriter.Create(ms, settings))
{
xml.Save(xw);
xw.Flush();
StreamReader sr = new StreamReader(ms);
ms.Seek(0, SeekOrigin.Begin);
String blah = sr.ReadToEnd();
}
Try this:
using System;
using System.IO;
using System.Text;
using System.Xml.Linq;
class Test
{
static void Main()
{
XDocument doc = XDocument.Load("test.xml",
LoadOptions.PreserveWhitespace);
doc.Declaration = new XDeclaration("1.0", "utf-8", null);
StringWriter writer = new Utf8StringWriter();
doc.Save(writer, SaveOptions.None);
Console.WriteLine(writer);
}
private class Utf8StringWriter : StringWriter
{
public override Encoding Encoding { get { return Encoding.UTF8; } }
}
}
Of course, you haven't shown us how you're building the document, which makes it hard to test... I've just tried with a hand-constructed XDocument and that contains the relevant whitespace too.
Try XmlWriterSettings:
XmlWriterSettings xws = new XmlWriterSettings();
xws.OmitXmlDeclaration = false;
xws.Indent = true;
And pass it on like
using (XmlWriter xw = XmlWriter.Create(sb, xws))
See also https://stackoverflow.com/a/3288376/1430535
return xdoc.Declaration.ToString() + Environment.NewLine + xdoc.ToString();

Categories