I am Converting my XML to XSLT by using xslCompiledTransform and when it transforms it changes the output result encoding which is UTF-16 (by default)
and when I try to change its or encoding it prompts the error that this property is read-only you can't change it!
I also try xmlWriter and xmlWriterSettings and memory Stream and other solutions but nothing works for me and for the reference I am adding code snippet
public static StringBuilder TransformXml(ProcessorConfigElement configSettings, StringBuilder xml, ILog logger)
{
//Perform transformation...
StringBuilder newXmlBuilder = new StringBuilder(xml.Length);
XslCompiledTransform requiredXslt = new XslCompiledTransform();
requiredXslt.Load(configSettings.XsltPath, XsltSettings.TrustedXslt, new XmlUrlResolver());
// I tried this trick also but all in vain
// Encoding wind1252 = Encoding.GetEncoding(1252);
// XmlWriterSettings xmlSettings = new XmlWriterSettings();
// xmlSettings.Encoding = wind1252;
// xmlSettings.ConformanceLevel = ConformanceLevel.Fragment;
// xmlSettings.OmitXmlDeclaration = true;
// XmlWriter writer = XmlWriter.Create(newXmlBuilder, xmlSettings);
StringBuilder s = new StringBuilder();
using (TextWriter newXmlWriter = StringWriter.Create(newXmlBuilder))
{
if (!string.IsNullOrEmpty(configSettings.Delimiter))
{
XsltArgumentList argsList = new XsltArgumentList();
argsList.AddParam("delimiter", "", configSettings.Delimiter);
// here is the actual problem when it transforms its create a mess and converting pound symbol and other symbols as diamond special character (encoding issue.)
requiredXslt.Transform(GetElement(xml.ToString()), argsList, newXmlWriter);
}
else
{
requiredXslt.Transform(GetElement(xml.ToString()), null, newXmlWriter);
}
}
logger.Info("XSLT applied successfully");
//replace string after transformation to validate and write to file
xml = newXmlBuilder;
return xml;
}
I want to use the desired UTF encoding while transforming it to XSLT, anyone?
As Martin Honnen already pointed out, if XSLT already has output declaration along the following line:
XSLT
<xsl:output indent="yes" method="xml" encoding="utf-8"/>
Here is c# that picks it up from the XSLT file via xslt.OutputSettings parameter:
c#
void Main()
{
const string SOURCEXMLFILE = #"e:\Temp\UniversalShipment.xml";
const string stylesheet = #"e:\Temp\UniversalShipment.xslt";
const string OUTPUTXMLFILE = #"e:\temp\UniversalShipment_output.xml";
bool paramXSLT = false;
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(stylesheet, XsltSettings.TrustedXslt, new XmlUrlResolver());
// Load the file to transform.
XPathDocument doc = new XPathDocument(SOURCEXMLFILE);
XsltArgumentList xslArg = new XsltArgumentList();
if (paramXSLT)
{
// Create a parameter which represents the current date and time.
DateTime d = DateTime.Now;
xslArg.AddParam("date", "", d.ToString());
}
using (XmlWriter writer = XmlWriter.Create(OUTPUTXMLFILE, xslt.OutputSettings))
{
xslt.Transform(doc, xslArg, writer);
}
}
Related
The code below serializes XML into a string, then writes it to an XML file (yes quite a bit going on with respect to UTF8 and removal of the Namespace):
var bidsXml = string.Empty;
var emptyNamespaces = new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty });
var settings = new XmlWriterSettings();
settings.Indent = true;
settings.OmitXmlDeclaration = true;
activity = $"Serialize Class INFO to XML to string";
using (MemoryStream stream = new MemoryStream())
using (StreamWriter writer = new StreamWriter(stream, Encoding.UTF8))
{
XmlSerializer xml = new XmlSerializer(info.GetType());
xml.Serialize(writer, info, emptyNamespaces);
bidsXml = Encoding.UTF8.GetString(stream.ToArray());
}
var lastChar = bidsXml.Substring(bidsXml.Length);
var fileName = $"CostOffer_Testing_{DateTime.Now:yyyy.MM.dd_HH.mm.ss}.xml";
var path = $"c:\\temp\\pjm\\{fileName}";
File.WriteAllText(path, bidsXml);
Problem is, serialization to XML seems to introduce a CR/LF (NewLine):
It's easier to see in the XML file:
A workaround is to strip out the "last" character:
bidsXml = bidsXml.Substring(0,bidsXml.Length - 1);
But better is to understand the root cause and resolve without a workaround - any idea why this a NewLine characters is being appended to the XML string?
** EDIT **
I was able to attempt a load into the consumer application (prior to this attempt I used an API to import the XML), and I received a more telling message:
The file you are loading is a binary file, the contents can not be displayed here.
So i suspect an unprintable characters is somehow getting embedded into the file/XML. When I open the file in Notepad++, I see the following (UFF-8-Byte Order Mark) - at least I have something to go on:
So it seems the consumer of my XML does not want BOM (Byte Order Mark) within the stream.
Visiting this site UTF-8 BOM adventures in C#
I've updated my code to use new UTF8Encoding(false)) rather than Encoding.UTF8:
var utf8NoBOM = new UTF8Encoding(false);
var bidsXml = string.Empty;
var emptyNamespaces = new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty });
var settings = new XmlWriterSettings();
settings.Indent = true;
settings.OmitXmlDeclaration = true;
activity = $"Serialize Class INFO to XML to string";
using (MemoryStream stream = new MemoryStream())
using (StreamWriter writer = new StreamWriter(stream, utf8NoBOM))
{
XmlSerializer xml = new XmlSerializer(info.GetType());
xml.Serialize(writer, info, emptyNamespaces);
bidsXml = utf8NoBOM.GetString(stream.ToArray());
}
var fileName = $"CostOffer_Testing_{DateTime.Now:yyyy.MM.dd_HH.mm.ss}.xml";
var path = $"c:\\temp\\pjm\\{fileName}";
File.WriteAllText(path, bidsXml, utf8NoBOM);
I have written below code to convert XML file to UTF-8 format file, it is working as excepted but issue is header is concatenating with body text instead of writing in separate line. I need utf8 in seperate line but file.writealltext will not accept more than 3 arguments/parameters. Any help appreciated.
string path = #"samplefile.xml";
string path_new = #"samplefile_new.xml";
Encoding utf8 = new UTF8Encoding(false);
Encoding ansi = Encoding.GetEncoding(1252);
string xml = File.ReadAllText(path, ansi);
XDocument xmlDoc = XDocument.Parse(xml);
File.WriteAllText(
path_new,
#"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""true"">" + xmlDoc.ToString(),
utf8
);
No need to use any API other than LINQ to XML. It has all means to deal with XML file encoding, prolog, BOM, indentation, etc.
void Main()
{
string outputXMLfile = #"e:\temp\XMLfile_UTF-8.xml";
XDocument xml = XDocument.Parse(#"<?xml version='1.0' encoding='utf-16'?>
<root>
<row>some text</row>
</root>");
XDocument doc = new XDocument(
new XDeclaration("1.0", "utf-8", null),
new XElement(xml.Root)
);
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.IndentChars = "\t";
// to remove BOM
settings.Encoding = new UTF8Encoding(false);
using (XmlWriter writer = XmlWriter.Create(outputXMLfile, settings))
{
doc.Save(writer);
}
}
I already searched a lot and unable to find a solution and unable to determine the correct approach
I am serializing an object to xml string and deserializing it back to an object using c#. XML string after serialization adds a leading ?. When I dezerialize it back to the object I am getting an error There is an error in XML document (1, 1)
?<?xml version="1.0" encoding="utf-16"?>
Serialization code:
string xmlString = null;
MemoryStream memoryStream = new MemoryStream();
XmlSerializer xs = new XmlSerializer(typeof(T));
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add("abc", "http://example.com/abc/");
XmlTextWriter xmlTextWriter = new XmlTextWriter(memoryStream,Encoding.Unicode);
xs.Serialize(xmlTextWriter, obj, ns);
memoryStream = (MemoryStream)xmlTextWriter.BaseStream;
xmlString = ConvertByteArrayToString(memoryStream.ToArray());
ConvertByteArrayToString:
UnicodeEncoding encoding = new UnicodeEncoding();
string constructedString = encoding.GetString(characters);
Deserialization Code:
XmlSerializer ser = new XmlSerializer(typeof(T));
StringReader stringReader = new StringReader(xml);
XmlTextReader xmlReader = new XmlTextReader(stringReader);
object obj = ser.Deserialize(xmlReader);
xmlReader.Close();
stringReader.Close();
return (T)obj;
I would like to know what I am doing wrong with encoding and I need a solution that works for most cases. Thanks
Use following function for serialization and Deserialization
public static string Serialize<T>(T dataToSerialize)
{
try
{
var stringwriter = new System.IO.StringWriter();
var serializer = new XmlSerializer(typeof(T));
serializer.Serialize(stringwriter, dataToSerialize);
return stringwriter.ToString();
}
catch
{
throw;
}
}
public static T Deserialize<T>(string xmlText)
{
try
{
var stringReader = new System.IO.StringReader(xmlText);
var serializer = new XmlSerializer(typeof(T));
return (T)serializer.Deserialize(stringReader);
}
catch
{
throw;
}
}
Your serialized XML contains a Unicode byte-order mark in the beginning, and this is where the deserializer fails.
To remove the BOM you need to create a different version of encoding suppressing BOM instead of using default Encoding.Unicode:
new XmlTextWriter(memoryStream, new UnicodeEncoding(false, false))
Here the second false prevents BOM being prepended to the string.
I'm working on some code to read an XML fragment which contains an XML declaration, e.g. <?xml version="1.0" encoding="utf-8"?> and parse the encoding. From MSDN, I should be able to do it like this:
var nt = new NameTable();
var mgr = new XmlNamespaceManager(nt);
var context = new XmlParserContext(null, mgr, null, XmlSpace.None);
var reader = new System.Xml.XmlTextReader(#"<?xml version=""1.0"" encoding=""UTF-8""?>",
System.Xml.XmlNodeType.XmlDeclaration, context);
However, I'm getting a System.Xml.XmlException on the call to the System.Xml.XmlTextReader constructor with an error message:
XmlNodeType XmlDeclaration is not supported for partial content
parsing.
I've googled this error in quotes -- exactly zero results found (edit: now there's one result: this post) -- and without quotes, which yields nothing useful. I've also looked at MSDN for the XmlNodeType, and it doesn't say anything about it not being supported.
What am I missing here? How can I get an XmlTextReader instance from an XML declaration fragment?
Note, my goal here is just to determine the encoding of a partially-built XML document where I'm making the assumption that it at least contains a declaration node; thus, I'm trying to get reader.Encoding. If there's another way to do that, I'm open to that.
At present, I'm parsing the declaration manually using regex, which is not the best approach.
Update: Getting the encoding from XML documentation or from XML fragment:
Here's a way to get the encoding without having to resort to fake root, using XmlReader.Create.
private static string GetXmlEncoding(string xmlString)
{
if (string.IsNullOrWhiteSpace(xmlString)) throw new ArgumentException("The provided string value is null or empty.");
using (var stringReader = new StringReader(xmlString))
{
var settings = new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment };
using (var xmlReader = XmlReader.Create(stringReader, settings))
{
if (!xmlReader.Read()) throw new ArgumentException(
"The provided XML string does not contain enough data to be valid XML (see https://msdn.microsoft.com/en-us/library/system.xml.xmlreader.read)");
var result = xmlReader.GetAttribute("encoding");
return result;
}
}
}
Here's the output, with a full and fragment XML:
If you want to have System.Text.Encoding, you can modify the code to look like this:
private static Encoding GetXmlEncoding(string xmlString)
{
using (StringReader stringReader = new StringReader(xmlString))
{
var settings = new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment };
var reader = XmlReader.Create(stringReader, settings);
reader.Read();
var encoding = reader.GetAttribute("encoding");
var result = Encoding.GetEncoding(encoding);
return result;
}
}
Old answer:
As you mentioned, XmlTextReader's Encoding-property contains the encoding.
Here's a full Console app's source code which hopefully is useful:
class Program
{
static void Main(string[] args)
{
var asciiXML = #"<?xml version=""1.0"" encoding=""ASCII""?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>";
var utf8XML = #"<?xml version=""1.0"" encoding=""UTF-8""?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>";
var asciiResult = GetXmlEncoding(asciiXML);
var utfResult = GetXmlEncoding(utf8XML);
Console.WriteLine(asciiResult);
Console.WriteLine(utfResult);
Console.ReadLine();
}
private static Encoding GetXmlEncoding(string s)
{
var stream = new MemoryStream(Encoding.UTF8.GetBytes(s));
using (var xmlreader = new XmlTextReader(stream))
{
xmlreader.MoveToContent();
var encoding = xmlreader.Encoding;
return encoding;
}
}
}
Here's the output from the program:
If you know that the XML only contains the declaration, maybe you can add an empty root? So for example:
var fragmentResult = GetXmlEncoding(xmlFragment + "<root/>");
Good evening, here's the solution with a System.Text.Encoding as output.
I made it to be clear, and step by step.
class Program
{
static void Main(string[] args)
{
var line = File.ReadLines(YourFileName).First();
var correctXml = line + "<Root></Root>";
var xml = XDocument.Parse(correctXml);
var stringEncoding = xml.Declaration.Encoding;
var encoding = System.Text.Encoding.GetEncoding(stringEncoding);
}
}
Maybe late but you can use below code after loading it in a XmlDocument
static string getEncoding(XmlDocument xml)
{
if (xml.FirstChild.NodeType == XmlNodeType.XmlDeclaration)
{
return (xml.FirstChild as XmlDeclaration).Encoding;
}
return "utf-8";
}
If you have a byte array as input, try something like this:
private Encoding getEncoding(byte[] data)
{
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Ignore;
XmlDocument doc = new XmlDocument();
MemoryStream ms = new MemoryStream(data);
XmlReader reader = XmlReader.Create(ms, settings);
doc.Load(reader);
XmlDeclaration declaration = doc.ChildNodes.OfType<XmlDeclaration>().FirstOrDefault();
return Encoding.GetEncoding(declaration.Encoding);
}
http://james.newtonking.com/projects/json/help/
How come when I use "DeserializeXmlNode" and my JSON gets converted to an XML document
then convert my XML document into a string like this
string strXML = "";
StringWriter writer = new StringWriter();
xmlDoc.Save(writer);
strXML = writer.ToString();
It includes
<?xml version="1.0" encoding="utf-16"?>
I did not add this, how do I remove it?
an XML without that line is not a valid XML file!
that line is called the XML Declaration
as an example, check out the OData XML from Netflix on Catalog Titles, can you see that first line?
http://odata.netflix.com/Catalog/Titles
Use XmlWriter with StringBuilder instead of StringWriter
var strXML = "";
var writer = new StringBuilder();
var settings = new System.Xml.XmlWriterSettings() { OmitXmlDeclaration = true};
var xmlWriter = System.Xml.XmlWriter.Create(strXML, settings);
xmlDoc.Save(xmlWriter);
strXML = writer.ToString();