Embedding an xml document inside an xml string [duplicate] - c#

This question already has answers here:
Best way to encode text data for XML
(13 answers)
Closed 2 years ago.
I have a web service that returns an xml string as results. The return string is in this format:
<ReturnValue>
<ErrorNumber>0
</ErrorNumber>
<Message>my message</Message>
</ReturnValue>
The data that I want to insert into the "message" tag is a serialized version of a custom object. The serialized format of that object contains xml and namespace declarations post serialization. When that gets thrown into the "message" tag of my return xml string, XmlSpy says that it's not well-formed. How should I get rid of the namespace declarations, or is there a different way to imbed a serialized object into an xml string?

Wrap the string in CDATA like so:
<![CDATA[your xml, which can be multi-line]]>
CDATA will inform a validator to treat the CDATA contents as ignored text. It's often the most expedient way to embed XML (or taggy non-XML content) as a string. You can run into problems if your embedded XML contains its own CDATA, but otherwise it's a simple fix.

Just make sure that your <Message> XML is encoded so that <, >, ", and & show up as <, >, " and &, respectively.
There are few built-in ways to encode the characters:
string message = System.Web.HttpUtility.HtmlEncode(serializedXml);
string message = System.Security.SecurityElement.Escape(serializedXml);
Using an XmlTextWriter to do the work for you
Use CDATA to wrap your XML
Also, this is probably a duplicate of:
Best way to encode text data for XML

Think of XML as a document not a string.
Create a node named "wrapper", and store the content of your file in it as a Base64 encoded string. The results will look like this.
<ReturnValue>
<ErrorNumber>0</ErrorNumber>
<Message>my message</Message>
<wrapper type="bin.base64">PD94bWwgdmVyc2lvbj0iMS4wIj8+PHhzbDpzdHlsZXNoZWV0IHZ
lcnNpb249IjEuMCIgeG1sbnM6eHNsPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L1hTTC9UcmFuc2Zvcm0
iIHhtbG5zOm1zeHNsPSJ1cm46c2NoZW1hcy1taWNyb3NvZnQtY29tOnhzbHQiPjx4c2w6b3V0cHV0IG1
ldGhvZD0ieG1sIiAvPjx4c2w6dGVtcGxhdGUgbWF0Y2g9Ii8iPjwveHNsOnRlbXBsYXRlPjwveHNsOnN
0eWxlc2hlZXQ+</wrapper>
</ReturnValue>
The following code shows how to add the wrapper, encode the content. Then it reverses the process to show that it all "works".
Using Base64 in XML has a number of other applications as well. For example embedding images, or other documents in XML content.
using System;
using System.IO;
using System.Xml;
public class t
{
static public string EncodeTo64(string toEncode) {
byte[] toEncodeAsBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(toEncode);
string returnValue = System.Convert.ToBase64String(toEncodeAsBytes);
return returnValue;
}
static public string DecodeFrom64(string encodedData) {
byte[] encodedDataAsBytes = System.Convert.FromBase64String(encodedData);
string returnValue = System.Text.ASCIIEncoding.ASCII.GetString(encodedDataAsBytes);
return returnValue;
}
public static void Main() {
try {
//Create the XmlDocument.
XmlDocument doc = new XmlDocument();
doc.LoadXml( #"
<ReturnValue>
<ErrorNumber>0</ErrorNumber>
<Message>my message</Message>
</ReturnValue>
");
XmlNode nodeMessage = doc.SelectSingleNode( "/ReturnValue/Message" );
if( nodeMessage != null ) {
XmlDocument docImport = new XmlDocument();
docImport.Load( "docwithnamespace.xml" );
// create a wrapper element for the file, then import and append it after <Message>
XmlElement nodeWrapper = (XmlElement)doc.CreateElement( "wrapper" );
nodeWrapper.SetAttribute( "type", "bin.base64" );
nodeWrapper = (XmlElement)doc.ImportNode( nodeWrapper, true );
XmlNode ndImport = nodeMessage.ParentNode.AppendChild( nodeWrapper.CloneNode( true ) );
ndImport.InnerText = EncodeTo64( docImport.OuterXml );
doc.Save( "wrapperadded.xml" );
// Next, let's test un-doing the wrapping
// Re-load the "wrapped" document
XmlDocument docSaved = new XmlDocument();
docSaved.Load( "wrapperadded.xml" );
// Get the wrapped element, decode from base64 write to disk
XmlNode node = doc.SelectSingleNode( "/ReturnValue/wrapper" );
if( node != null ) {
// Load the content, and save as a new XML
XmlDocument docUnwrapped = new XmlDocument();
docUnwrapped.LoadXml( DecodeFrom64( node.InnerText ) );
docUnwrapped.Save( "unwrapped.xml" );
Console.WriteLine( "Eureka" );
}
}
} catch( Exception e ) {
Console.WriteLine(e.Message);
}
}
}

Related

How can I preserve entity characters using Xml Document?

My question is simple, but I just can't find why I have this problem and can't resolve it.
I need to read a XML file with values and use them on Unity. For now on, I read my document with its path :
XmlDocument doc = new XmlDocument();
doc.Load(path);
XmlElement root = doc.DocumentElement;
I have a Namespace Manager already configured.
I read my data like this :
string text = node.SelectSingleNode("x:textRuns/x:DOMTextRun/x:characters", nsmgr).InnerText.Replace("
", Environment.NewLine);
My XML and the data I would like to extract :
<characters>Third occupant
folding seat</characters>
My objective is to replace this entity character : "& #xD;" with an Environment.NewLine.
I tried to :
Formalize the Xml in a file with a replace
Read with an InnerText, and an InnerXml
Make an entity char "detector"
Get the node with all its content (OuterXML)
It looks like this char, however you read it, is exclude and not readable, I just can't have it on my console.
The entity has already been replaced once you extracted InnerText. Problem is, you have a CR (carriage return; 0x0D, \r) instead of a LF (line feed; 0x0A, \n). So replace "\r" by Environment.NewLine:
public static void Main() {
XmlDocument doc = new XmlDocument();
doc.LoadXml("<characters>Third occupant
folding seat</characters>");
string text = doc.SelectSingleNode("/characters").InnerText;
text = text.Replace("\r", Environment.NewLine);
Console.WriteLine(text);
}

xml string change the header encoding using c#

I have string of XML .
how can I change the header from:
string xml = "<?xml version='1.0' encoding='ISO-8859-8'?>";
to
string xml = "<?xml version='1.0' encoding='UTF-8'?>";
using c#?
UPDATE
I tryed to get the xml to User object
XmlSerializer serializer = new XmlSerializer(typeof(User));
MemoryStream memStream = new MemoryStream(Encoding.UTF8.GetBytes(xml));
User user = (User)serializer.Deserialize(memStream);
but in the User object I get the string not encoding well.
because of the encoding of the Xml I need to change the encoding.
Instead of Encoding.UTF8.GetBytes use Encoding.GetEncoding("ISO-8859-8").GetBytes.
If the XML is stored in a string variable and you need to only replace the value in the encoding attribute, then you can perform a replace as following:
const string searchEncoding = "ISO-8859-8";
const string newEncoding = "UTF-8";
string xml = #"<?xml version='1.0' encoding='ISO-8859-8'?><abc></abc>";
int encodingPos = xml.IndexOf(searchEncoding);
if (encodingPos==30)
{
xml = xml.Substring(0, encodingPos) + newEncoding + xml.Substring(encodingPos + searchEncoding.Length);
}
However, a different process is necessary if the XML is stored in another datatype and/or you need to re-encode the XML content.

XmlSerializer escapes an added escape character

I'm using the XmlSerializer to output a class to a .xml file. For the most part, this is working as expected and intended. However, as a requirement, certain characters need to be removed from the values of the data and replaced with their proper escape characters.
In the elements I need to replace values in, I'm using the Replace() method and returning the updated string. The code below shows this string replacement; the lines commented out are because the XmlSerializer already escapes those particular characters.
I have a requirement from a third-party to escape &, <, >, ', and " characters when they appear within the values of the XML elements. Currently the characters &, <, and > are being escaped appropriately through the XmlSerializer.
The error received when these characters are present is:
Our system has detected a potential threat in the request message attachment.
However, when I serialize the XML Document after performing the string replace, the XmlSerializer sees the & character in &apos; and makes it &apos;. I think this is a correct functionality of the XmlSerializer object. However, I would like the serializer to either a.) ignore the escape characters; or b.) serialize the other characters which are necessary to escape.
Can anyone shed some light on, specifically, how to accomplish either of these?
String Replacement Method
public static string CheckValueOfProperty(string str)
{
string trimmedString = str.Trim();
if (string.IsNullOrEmpty(trimmedString))
return null;
else
{
// Commented out because the Serializer already transforms a '&' character into the appropriate escape character.
//trimmedString = trimmedString .Replace("&", "&");
//trimmedString = trimmedString.Replace("<", "<");
//trimmedString = trimmedString.Replace(">", ">");
trimmedString = trimmedString.Replace("'", "&apos;");
trimmedString = trimmedString.Replace("\"", """);
return trimmedString;
}
}
XmlSerializer Code
public static void SerializeAndOutput(object obj, string outputFilePath, XmlSerializerNamespaces ns = null)
{
XmlSerializer x = new XmlSerializer(obj.GetType());
// If the Output File already exists, delete it.
if (File.Exists(outputFilePath))
{
File.Delete(outputFilePath);
}
// Then, Create the Output File and Serialize the parameterized object as Xml to the Output File
using (TextWriter tw = File.CreateText(outputFilePath))
{
if (ns == null)
{
x.Serialize(tw, obj);
}
else { x.Serialize(tw, obj, ns); }
}
// =====================================================================
// The code below here is no longer needed, was used to force "utf-8" to
// UTF-8" to ensure the result was what was being expected.
// =====================================================================
// Create a new XmlDocument object, and load the contents of the OutputFile into the XmlDocument
// XmlDocument xdoc = new XmlDocument() { PreserveWhitespace = true };
// xdoc.Load(outputFilePath);
// Set the Encoding property of each XmlDeclaration in the document to "UTF-8";
// xdoc.ChildNodes.OfType<XmlDeclaration>().ToList().ForEach(d => d.Encoding = "UTF-8");
// Save the XmlDocument to the Output File Path.
// xdoc.Save(outputFilePath);
}
The single and double quote characters do not need to be escaped when used inside the node content in XML. The single quote or double quote characters only need to be escaped when used in a value of a node attribute. That's why the XMLSerializer does not escape them. And you also do not need to escape them.
See this question and answer for reference.
BTW: The way you set the Encoding to UTF-8 afterwards, is awkward as well. You can specify the encoding with the StreamWriter and then the XMLSerializer will automatically use that encoding and also specify it in the XML declaration.
Here's the solution I came up with. I have only tested it with a sample XML file and not the actual XML file I'm creating, so performance may take a hit; however, this seems to be working.
I'm reading the XML file line-by-line as a string, and replacing any of the defined "special" characters found in the string with their appropriate escape characters. It should process in the order of the specialCharacterList Dictionary<string, string> variable, which means the & character should process first. When processing <, > and " characters, it will only look at the value of the XML element.
using System;
using System.Collections.Generic;
using System.IO;
namespace testSerializer
{
class Program
{
private static string filePath = AppDomain.CurrentDomain.BaseDirectory + "testFile.xml";
private static string tempFile = AppDomain.CurrentDomain.BaseDirectory + "tempFile.xml";
private static Dictionary<string, string> specialCharacterList = new Dictionary<string, string>()
{
{"&","&"}, {"<","<"}, {">",">"}, {"'","&apos;"}, {"\"","""}
};
static void Main(string[] args)
{
ReplaceSpecialCharacters();
}
private static void ReplaceSpecialCharacters()
{
string[] allLines = File.ReadAllLines(filePath);
using (TextWriter tw = File.CreateText(tempFile))
{
foreach (string strLine in allLines)
{
string newLineString = "";
string originalString = strLine;
foreach (var item in specialCharacterList)
{
// Since these characters are all valid characters to be present in the XML,
// We need to look specifically within the VALUE of the XML Element.
if (item.Key == "\"" || item.Key == "<" || item.Key == ">")
{
// Find the ending character of the beginning XML tag.
int firstIndexOfCloseBracket = originalString.IndexOf('>');
// Find the beginning character of the ending XML tag.
int lastIndexOfOpenBracket = originalString.LastIndexOf('<');
if (lastIndexOfOpenBracket > firstIndexOfCloseBracket)
{
// Determine the length of the string between the XML tags.
int lengthOfStringBetweenBrackets = lastIndexOfOpenBracket - firstIndexOfCloseBracket;
// Retrieve the string that is between the element tags.
string valueOfElement = originalString.Substring(firstIndexOfCloseBracket + 1, lengthOfStringBetweenBrackets - 1);
newLineString = originalString.Substring(0, firstIndexOfCloseBracket + 1) + valueOfElement.Replace(item.Key, item.Value) + originalString.Substring(lastIndexOfOpenBracket);
}
}
// For the ampersand (&) and apostrophe (') characters, simply replace any found with the escape.
else
{
newLineString = originalString.Replace(item.Key, item.Value);
}
// Set the "original" string to the new version.
originalString = newLineString;
}
tw.WriteLine(newLineString);
}
}
}
}
}

How to use replace with tricky characters in C#?

I am trying to replace within a string
<?xml version="1.0" encoding="UTF-8"?>
<response success="true">
<output><![CDATA[
And
]]></output>
</response>
with nothing.
The problem I am running into is the characters <> and " characters are interacting within the replace. Meaning, it's not reading those lines as a full string all together as one but breaking the string when it comes to a <> or ". Here is what I have but I know this isn't right:
String responseString = reader.ReadToEnd();
responseString.Replace(#"<<?xml version=""1.0"" encoding=""UTF-8""?><response success=""true""><output><![CDATA[[", "");
responseString.Replace(#"]]\></output\></response\>", "");
What would be the correct code to get the replace to see these lines as just a string?
A string will never change. The Replace method works as follows:
string x = "AAA";
string y = x.Replace("A", "B");
//x == "AAA", y == "BBB"
However, the real problem is how you handle the XML response data.
You should reconsider your approach of handling incoming XML by string replacement. Just get the CDATA content using the standard XML library. It's as easy as this:
using System.Xml.Linq;
...
XDocument doc = XDocument.Load(reader);
var responseString = doc.Descendants("output").First().Value;
The CDATA will already be removed. This tutorial will teach more about working with XML documents in C#.
Given your document structure, you could simply say something like this:
string response = #"<?xml version=""1.0"" encoding=""UTF-8""?>"
+ #"<response success=""true"">"
+ #" <output><![CDATA["
+ #"The output is some arbitrary text and it may be found here."
+ "]]></output>"
+ "</response>"
;
XmlDocument document = new XmlDocument() ;
document.LoadXml( response ) ;
bool success ;
bool.TryParse( document.DocumentElement.GetAttribute("success"), out success) ;
string content = document.DocumentElement.InnerText ;
Console.WriteLine( "The response indicated {0}." , success ? "success" : "failure" ) ;
Console.WriteLine( "response content: {0}" , content ) ;
And see the expected results on the console:
The response indicated success.
response content: The output is some arbitrary text and it may be found here.
If your XML document is a wee bit more complex, you can easily select the desired node(s) using an XPath query, thus:
string content = document.SelectSingleNode( #"/response/output" ).InnerText;

an error occurred while parsing entityname with '&'

I have the next program that open a .XML document with Visual c#. I can´t open the Xml because it has a '&', and I don´t know how i can open.
private void button1_Click(object sender, EventArgs e)
{
XmlDocument doc;
doc = new XmlDocument();
doc.Load("nuevo.xml");
XmlNodeList menus;
menus = doc.GetElementsByTagName("menu");
foreach (XmlNode unMenu in menus)
{
if (unMenu.Attributes["precio"].Value == "50")
{
//Console.WriteLine(unMenu.Attributes["type"].Value);
XPathNavigator navegador = doc.CreateNavigator();
XPathNodeIterator nodos = navegador.Select("/restaurante");
while (nodos.MoveNext())
{
Console.WriteLine(nodos.Current.OuterXml);
Console.WriteLine();
textBox1.Text = nodos.Current.OuterXml;
}
}
}
}
If you get the error
an error occurred while parsing entityname with '&'
then there is an "&" somewhere in the name of an XML element. This is not allowed in an XML document. You cannot open an invalid XML file with the XmlDocument (or XDocument) class.
There are several things you can do:
Make sure that the XML files are always valid before trying to read them. This however depends on your scenario and may not be possible.
Preprocess your XML file to fix the invalid content by replacing "&" with "&". You can either do this manually or at run-time.
Use HtmlAgilityPack to parse the invalid file.
Personally, I would go with 1) if possible or 2) otherwise.
Replace all occurances of & with & in the xml.
So after spending hours on this issue: it turns out that if you have an ampersand symbol ("&") or any other XML escape characters within your xml string, it will always fail will you try read the XML. TO solve this, replace the special characters with their escaped string format
YourXmlString = YourXmlString.Replace("'", "&apos;").Replace("\"", """).Replace(">", ">").Replace("<", "<").Replace("&", "&");

Categories