XDocument saves & instead of just & - c#

I am using XDocument to switch a value in an xml document.
In the new value I need to use the character '&' (ampersand)
but after XDocument.save() the xml has & instead!
I tried using encoding and stuff… nothing worked

XDocument is doing exactly what it's supposed to do.
& is invalid XML. (it's an unfinished character/entity reference)

& means "Start of an entity" in XML so if you want to include an & as data you must express it as an entity — & (or use it in a CDATA block).
What you describe is normal behaviour and the XML would break otherwise.

There are two options. Either to ensure proper XML encoding/decoding of all your content in the XML document. Remember that HTML and XML encoding/decoding is slightly different.
Option two is to use base64 encoding on whatever content in the xml that might contain invalid elements.

Is your output file app.config supposed to be an XML file?
If it is, then the & must be escaped as &.
If it isn't, then you should be using the text output method instead of the xml output method: use <xsl:output method='text'/>.
PS: this question appears to be a duplicate of How can I add an ampersand for a value in a ASP.net/C# app config file value

Related

Parsing XML with Special Chars (SQL / CLR)

I have looked at most of the parsing of XML into SQL with special Chars and could not find anything relevant that didnt include having control over the XML output itself.
I understand that the way to do this would be make sure all special characters are escaped, the issue i have is that i do not have control over the XML that gets generated until after the fact. The output i could have could be something like the below. I need to find a way to replace all the special characters within the without touching the characters that are valid for the xml. This could be done using a CLR or in Straight up SQL, i will even consider other options.
<?xml version="1.0" ?>
<A>
<B>this is my test <myemail#gmail.com</B>
<B>>>>this is another test<<<</B>
</A>
You are probably looking for something similar to HtmlEncode() of the contents. Loop through your XML structure and encode the fields you need to prior to writing to the DB, and perform the HtmlDecode() on the read from the DB.
https://msdn.microsoft.com/en-us/library/w3te6wfz%28v=vs.110%29.aspx
IF you are sure the XML element names are valid then the solution could be using regular expressions to parse the XML as text and substitute the & with & and the > with > and < with <.
Have a look here regular expression to find special character & between xml tags for example.

Save string to XML File

I want to save the following string in an XML File:
<text><![CDATA[<p>what is my pet name</p>]]></text>
When I am saving it, it looks like:
<text><![CDATA[<p>what is my pet name</p>]]></text>
I have tried File.WriteAllText(), XmlDocument.Save() methods but didnt get the proper response.
basically everywhere other than opening and closing tags in the XML, < is replaced by < and > is replaced by >.
What is happening is that the XML parser is encoding your string. When you try to access the string later, it can be decoded again at that time.
What I suggest, is that you either try to load the text as into a new 'XmlDocument' with XmlDocument.LoadXml(string s), and then import that into your current document, or leave it encoded.
You should not try to both use an XML parser, and manually add text at the same time.
I guess you add the CDATA manually and the XML writing mechanism correctly escapes your CDATA because it treats it as text content. Instead explicitly add a CDATA section with just the contents.
If you are using the old XML API (System.XML), then use this method to create the CDATA Section: http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.createcdatasection
Then append the node to the element just like in the example in the link.
XML is being written correctly.
XML has special characters that are reserved for commands, just like C# reserves words like "if" and "string".
XML is encoding your string for storage. What you need to do is when you retrieve your string, run it through a similar decode process.
Use this: HttpServerUtility.HtmlDecode(encodedString)
Reference:
Decode XML returned by a webservice (< and > are replaced with < and &gt)?

How to Prevent the conversion of & to & using XmlTextWriter?

The '&' in the text gets escaped and gets converted to & when creating the xml file using XmlTextWriter
but i dont want the conversion to take place how to prevent it?
Is there any other way besides using WriteRaw func of xmltextwriter?
If you put an unescaped ampersand in XML it is no longer valid XML.
Your two choices are either escape it (which your library is doing):
<tag>One & another</tag>
Or wrap it in CDATA:
<tag><![CDATA[One & another]]></tag>
which can be done by:
xmlWriter.WriteCData("One & another");
Why don't you want this to happen ? It looks to me like the library is enforcing correct character escaping as required by XML.
I am not sure if it will work, but you might want to check into the XmlWriterSettings Properties available through xmltextwriter.Settings, especially the CheckCharacters property.
On the other hand, as Brian Agnew mentioned, not encoding the & to & might be the wrong way to go as it will invalidate your XML (unless you already encoded it to & in which case you might just want to decode it before giving it to the xmltextwriter class.

C# Special Characters not displayed propely in XML

i have a string that contains special character like (trademark sign etc). This string is set as an XML node value. But the special character is not rendered properly in XML, shows ??. This is how im using it.
String str=xxxx; //special character string
XmlNode node = new XmlNode();
node.InnerText = xxxx;
I tried HttpUtility.htmlEncode(xxxx) but it converts it into "&amp ;#8482;" so the output of xml is "&#8482"; instead of ™
I have also tried XmlConvert.ToString() and XmlConvert.EncodeName but it gives ??
I strongly suspect that the problem is how you're viewing the XML. Have you made sure that whatever you're viewing it in is using the right encoding?
If you save the XML and then reload it and fetch the inner text as a string, does it have the right value? If so, where's the problem?
You shouldn't perform extra encoding yourself - let the XML APIs do their job.
I've had issues with some characters using htmlEncode() before, as well. Here's a good example of different ways to write your XML: Different Ways to Escape an XML String in C#. Check out #3 (System.Security.SecurityElement.Escape()) and #4 (System.Xml.XmlTextWriter), these are the methods I typically use.

XmlDocument dropping encoded characters

My C# application loads XML documents using the following code:
XmlDocument doc = new XmlDocument();
doc.Load(path);
Some of these documents contain encoded characters, for example:
<xsl:text>
</xsl:text>
I notice that when these documents are loaded,
gets dropped.
My question: How can I preserve <xsl:text>
</xsl:text>?
FYI - The XML declaration used for these documents:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
Are you sure the character is dropped? character 10 is just a line feed- it wouldn't exactly show up in your debugger window. It could also be treated as whitespace. Have you tried playing with the whitespace settings on your xmldocument?
If you need to preserve the encoding you only have two choices: a CDATA section or reading as plain text rather than Xml. I suspect you have absolutely 0 control over the documents that come into the system, therefore eliminating the CDATA option.
Plain-text rather than Xml is probably distasteful as well, but it's all you have left. If you need to do validation or other processing you could first load and verify the xml, and then concatenate your files using simple file streams as a separate step. Again: not ideal, but it's all that's left.
is a linefeed - i.e. whitespace. The XML parser will load it in as a linefeed, and thereafter ignore the fact that it was originally encoded. The encoding is just part of the serialization of the data to text format - it's not part of the data itself.
Now, XML sometimes ignores whitespace and sometimes doesn't, depending on context, API etc. As Joel says you may find that it's not missing at all - or you may find that using it with an API which allows you to preserve whitespace fixes the problem. I wouldn't be at all surprised to see it turned into an unencoded linefeed character when you output the data though.
maybe it would be better to keep data in ![CDATA] ?
http://www.w3schools.com/XML/xml_cdata.asp

Categories