The '&' in the text gets escaped and gets converted to & when creating the xml file using XmlTextWriter
but i dont want the conversion to take place how to prevent it?
Is there any other way besides using WriteRaw func of xmltextwriter?
If you put an unescaped ampersand in XML it is no longer valid XML.
Your two choices are either escape it (which your library is doing):
<tag>One & another</tag>
Or wrap it in CDATA:
<tag><![CDATA[One & another]]></tag>
which can be done by:
xmlWriter.WriteCData("One & another");
Why don't you want this to happen ? It looks to me like the library is enforcing correct character escaping as required by XML.
I am not sure if it will work, but you might want to check into the XmlWriterSettings Properties available through xmltextwriter.Settings, especially the CheckCharacters property.
On the other hand, as Brian Agnew mentioned, not encoding the & to & might be the wrong way to go as it will invalidate your XML (unless you already encoded it to & in which case you might just want to decode it before giving it to the xmltextwriter class.
Related
XML snippet:
<field>& is escaped</field>
<field>"also escaped"</field>
<field>is & "not" escaped</field>
<field>is " and is not & escaped</field>
I'm looking for suggestions on how I could go about pre-parsing any XML to escape everything not escaped prior to running the XML through a parser?
I do not have control over the XML being passed to me, they likely won't fix it anytime soon, and I have to find a way to parse it.
The primary issue I'm running into is that running the XML as is into a parser, such as (below) will throw an exception due to the XML being bad due to some of it not being escaped properly
string xml = "<field>& is not escaped</field>";
XmlReader.Create(new StringReader(xml))
I'd suggest you use a Regex to replace un-escaped ampersands with their entity equivalent.
This question is helpful as it gives you a Regex to find these rogue ampersands:
&(?!(?:apos|quot|[gl]t|amp);|#)
And you can see that it matches the correct text in this demo. You can use this in a simple replace operation:
var escXml = Regex.Replace(xml, "&(?!(?:apos|quot|[gl]t|amp);|#)", "&");
And then you'll be able to parse your XML.
Preprocess the textual data (not really XML) with HTML Tidy with quote-ampersand set to true.
If you want to parse something that isn't XML, you first need to decide exactly what this language is and what you intend to do with it: when you've written a grammar for the non-XML language that you intend to process, you can then decide whether it's possible to handle it by preprocessing or whether you need a full-blown parser.
For example, if you only need to handle an unescaped "&" that's followed by a space, and if you don't care about what happens inside comments and CDATA sections, then it's a fairly easy problem. If you don't want to corrupt the contents of comments or CDATA, or if you need to handle things like when there's no definition of &npsp;, then life starts to become rather more difficult.
Of course, you and your supplier could save yourselves a great deal of time and expense if you wrote software that conformed to standards. That's what standards are for.
One of my element in an xml has a value like
<item name="abc_def>" />
The actual value pulled from the data source is "abc_def!!>". I have no control over this data source and this cannot be changed.
I wanted to know how do I escape these characters when xml serialization is taking place. I have tried a couple of things, but they didnt work.
I tried all methods explained here
What is the correct way to escape these characters ? The end output is an api which our clients hit using their browsers and because of this issue, the xml parsing in browser is breaking.
If all of your strings look like that one, you can do something like this:
string input = "abc_def>";
input = input.Replace("", "!!");
string output = HttpUtility.HtmlDecode(input);
You need to use:
System.Net.WebUtility.HtmlEncode(stringToEncode);
Of course when you later decode that you use:
System.Net.WebUtility.HtmlDecode(stringToDecode);
This is for UWP, namespace may vary depending on what framework you use.
I am using XMLTextWriter to serialize a bunch of my objects into HTML (since HTML is basically XML), and all of my objects are able to read/write themselves as XML anyway. The method works great except for one small snag. HTML has some invalid XML such as for a space. The TextWriter always converts this to &nbps;. I can not wrap this in a CDATA tag because the browser will simply ignore the tag, I literally need the XmlTextWriter to leave my & alone.
Have you tried XmlTextWriter.WriteRaw() to write those values?
I'm pretty sure this doesn't get escaped - not sure how this ties in with the code you've got though...
I am using XDocument to switch a value in an xml document.
In the new value I need to use the character '&' (ampersand)
but after XDocument.save() the xml has & instead!
I tried using encoding and stuff… nothing worked
XDocument is doing exactly what it's supposed to do.
& is invalid XML. (it's an unfinished character/entity reference)
& means "Start of an entity" in XML so if you want to include an & as data you must express it as an entity — & (or use it in a CDATA block).
What you describe is normal behaviour and the XML would break otherwise.
There are two options. Either to ensure proper XML encoding/decoding of all your content in the XML document. Remember that HTML and XML encoding/decoding is slightly different.
Option two is to use base64 encoding on whatever content in the xml that might contain invalid elements.
Is your output file app.config supposed to be an XML file?
If it is, then the & must be escaped as &.
If it isn't, then you should be using the text output method instead of the xml output method: use <xsl:output method='text'/>.
PS: this question appears to be a duplicate of How can I add an ampersand for a value in a ASP.net/C# app config file value
i have a string that contains special character like (trademark sign etc). This string is set as an XML node value. But the special character is not rendered properly in XML, shows ??. This is how im using it.
String str=xxxx; //special character string
XmlNode node = new XmlNode();
node.InnerText = xxxx;
I tried HttpUtility.htmlEncode(xxxx) but it converts it into "& ;#8482;" so the output of xml is "™"; instead of ™
I have also tried XmlConvert.ToString() and XmlConvert.EncodeName but it gives ??
I strongly suspect that the problem is how you're viewing the XML. Have you made sure that whatever you're viewing it in is using the right encoding?
If you save the XML and then reload it and fetch the inner text as a string, does it have the right value? If so, where's the problem?
You shouldn't perform extra encoding yourself - let the XML APIs do their job.
I've had issues with some characters using htmlEncode() before, as well. Here's a good example of different ways to write your XML: Different Ways to Escape an XML String in C#. Check out #3 (System.Security.SecurityElement.Escape()) and #4 (System.Xml.XmlTextWriter), these are the methods I typically use.