Format string for XML - c#

I am writing some code to send an XML document to a Servlet. For one of the XML tag fields, I need to fill it with a string that is retrieved from an external file.
I have found a couple of external files that contain some < and > characters. The servlet will not accept this XML document in this case.
If I remove the < and > characters from the XML tag field, the XML document is sent correctly.
As I am going to be using 1000s of external files, I am sure there will be other occurances of "illegal" characters. Is there an XML encode or similar function that can be used to format a string such that it can be stored in an XML tag with no errors?
I have tried HTML encode, but this does not work. Is there an equivilent action for XML?

If you really want to build your own XML strings, put your external character in a CDATA tag. You just need to make sure that the end sequence (which is ]]>) is not in the external file. If you find this, you have to encore or replace that with some other string before. So:
<![CDATA[*your external stuff containing < and > here*]]>

Related

Programatically tell the difference between data

Im converting mass files to XML and each file is either XML, JSON, CSV or PSV. To do the conversion I need to know what data type the file is without looking at the file extension (Some are coming from API's). Someone suggested that I try parse each file by each of the types until you get a success but that is pretty inefficient and CSV cant be easily parsed as it is essentially just a text file (Same as PSV).
Does anyone have any ideas on what I can do? Thanks.
You can have some kind of "pre-parsing":
Either it starts with an XML declaration, or directly with the root node, first character of an XML file should be <.
First character of a JSON file can only be { if the JSON is built on an object, or [ if the JSON is built on an array.
For CSV and PSV (I guess PSV stands for Point-Separated Values?), each line of the file represent a specific record.
So by checking first character, you may find XML and/or JSON parsing is pointless.
Parsing the first line of the file should be enough to decide if the file format is CSV or PSV.

Parsing XML with Special Chars (SQL / CLR)

I have looked at most of the parsing of XML into SQL with special Chars and could not find anything relevant that didnt include having control over the XML output itself.
I understand that the way to do this would be make sure all special characters are escaped, the issue i have is that i do not have control over the XML that gets generated until after the fact. The output i could have could be something like the below. I need to find a way to replace all the special characters within the without touching the characters that are valid for the xml. This could be done using a CLR or in Straight up SQL, i will even consider other options.
<?xml version="1.0" ?>
<A>
<B>this is my test <myemail#gmail.com</B>
<B>>>>this is another test<<<</B>
</A>
You are probably looking for something similar to HtmlEncode() of the contents. Loop through your XML structure and encode the fields you need to prior to writing to the DB, and perform the HtmlDecode() on the read from the DB.
https://msdn.microsoft.com/en-us/library/w3te6wfz%28v=vs.110%29.aspx
IF you are sure the XML element names are valid then the solution could be using regular expressions to parse the XML as text and substitute the & with & and the > with > and < with <.
Have a look here regular expression to find special character & between xml tags for example.

Strip < Character from XML content

I have an XML Document where it contains data with < character.
<Tunings>
<Notes>Norm <150 mg/dl</Notes>
</Tunings>
The code I am using is:
StreamReader objReader = new StreamReader(strFile);
string strData = objReader.ReadToEnd();
XmlDocument doc = new XmlDocument();
// Here I want to strip those characters from "strData"
doc.LoadXml(strData);
So it gives error:
Name cannot begin with the '1' character, hexadecimal value 0x31.
So is there a way to strip those characters from XML before Load calls.?
If this is only occurring in the <Notes> section, I'd recommend you modify the creation of the XML file to use a CDATA tag to contain the text in Notes, like this:
<Notes><![CDATA[Norm <150 mg/dl]]></Notes>
The CDATA tag tells XML parsers to not parse the characters between the <![CDATA[ and ]]>. This allows you have characters in your XML that would otherwise break the parsing.
You can use the CDATA tag for any situation where you know (or have reasonable expectations) of special characters in that data.
Trying to handle special characters at parsing time (without the CDATA) will be more labor intensive (and frustrating) than simply fixing the creation of the XML in the first place, IMO. Plus, "Norm <150 mg/dl" is not the same thing as "Norm 150 mg/dl", and that distinction might be important for whoever needs that information.
As the comments state, you do not have an XML document. If you know that the only way that these documents deviate from legal XML is as in your example, you could run the file through a regular expression and replace <(?:\d) with &. This will find the < adjacent to a number and properly encode it.

Save string to XML File

I want to save the following string in an XML File:
<text><![CDATA[<p>what is my pet name</p>]]></text>
When I am saving it, it looks like:
<text><![CDATA[<p>what is my pet name</p>]]></text>
I have tried File.WriteAllText(), XmlDocument.Save() methods but didnt get the proper response.
basically everywhere other than opening and closing tags in the XML, < is replaced by < and > is replaced by >.
What is happening is that the XML parser is encoding your string. When you try to access the string later, it can be decoded again at that time.
What I suggest, is that you either try to load the text as into a new 'XmlDocument' with XmlDocument.LoadXml(string s), and then import that into your current document, or leave it encoded.
You should not try to both use an XML parser, and manually add text at the same time.
I guess you add the CDATA manually and the XML writing mechanism correctly escapes your CDATA because it treats it as text content. Instead explicitly add a CDATA section with just the contents.
If you are using the old XML API (System.XML), then use this method to create the CDATA Section: http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.createcdatasection
Then append the node to the element just like in the example in the link.
XML is being written correctly.
XML has special characters that are reserved for commands, just like C# reserves words like "if" and "string".
XML is encoding your string for storage. What you need to do is when you retrieve your string, run it through a similar decode process.
Use this: HttpServerUtility.HtmlDecode(encodedString)
Reference:
Decode XML returned by a webservice (< and > are replaced with < and &gt)?

Escape an xml string while creating xml file

I need to create a xml file which is to be converted to an excel file(.xls), and this means that the xml has a lot of meta info in it. Its easy to write all the contents into the xml file as a text file.
var sw = new FileInfo(tempReportFilePath).CreateText();
sw.WriteLine("meta info and other tags")
However, this method does not escape characters, and when the data contains '<' or '>' or '&' etc. the xml is rendered invalid and the .xls file does not open. I can easily do a replace ( '<' with '<' and so on), but for performance reasons, this method is not suitable.
The other alternative is to use xml text writer, but with a ton of meta info, it will mean writing a lot of tags in code. With sw.WriteLine('stuff'), I could simply put parts of meta info in one tag (as a string) and write them to file. Using xslt, the problem I faced was that tags required spaces. For example, for tabular data, the top row fields could have spaces.
How to go about creating a well formed xml file with a lot of meta info, and where the chareacters ('<', '>' etc) are excaped?
Uri.EscapeDataString(string stringToEscape);
XDocument tutorials.
Why not create xls in the first place, there is a nice library to do so :
http://npoi.codeplex.com/
I used the WriteRaw method for writing the meta info tags. For the other data, which was required to be escaped, I used WriteString method.

Categories