How can I preserve entity characters using Xml Document? - c#

My question is simple, but I just can't find why I have this problem and can't resolve it.
I need to read a XML file with values and use them on Unity. For now on, I read my document with its path :
XmlDocument doc = new XmlDocument();
doc.Load(path);
XmlElement root = doc.DocumentElement;
I have a Namespace Manager already configured.
I read my data like this :
string text = node.SelectSingleNode("x:textRuns/x:DOMTextRun/x:characters", nsmgr).InnerText.Replace("
", Environment.NewLine);
My XML and the data I would like to extract :
<characters>Third occupant
folding seat</characters>
My objective is to replace this entity character : "& #xD;" with an Environment.NewLine.
I tried to :
Formalize the Xml in a file with a replace
Read with an InnerText, and an InnerXml
Make an entity char "detector"
Get the node with all its content (OuterXML)
It looks like this char, however you read it, is exclude and not readable, I just can't have it on my console.

The entity has already been replaced once you extracted InnerText. Problem is, you have a CR (carriage return; 0x0D, \r) instead of a LF (line feed; 0x0A, \n). So replace "\r" by Environment.NewLine:
public static void Main() {
XmlDocument doc = new XmlDocument();
doc.LoadXml("<characters>Third occupant
folding seat</characters>");
string text = doc.SelectSingleNode("/characters").InnerText;
text = text.Replace("\r", Environment.NewLine);
Console.WriteLine(text);
}

Related

an error occurred while parsing entityname with '&'

I have the next program that open a .XML document with Visual c#. I can´t open the Xml because it has a '&', and I don´t know how i can open.
private void button1_Click(object sender, EventArgs e)
{
XmlDocument doc;
doc = new XmlDocument();
doc.Load("nuevo.xml");
XmlNodeList menus;
menus = doc.GetElementsByTagName("menu");
foreach (XmlNode unMenu in menus)
{
if (unMenu.Attributes["precio"].Value == "50")
{
//Console.WriteLine(unMenu.Attributes["type"].Value);
XPathNavigator navegador = doc.CreateNavigator();
XPathNodeIterator nodos = navegador.Select("/restaurante");
while (nodos.MoveNext())
{
Console.WriteLine(nodos.Current.OuterXml);
Console.WriteLine();
textBox1.Text = nodos.Current.OuterXml;
}
}
}
}
If you get the error
an error occurred while parsing entityname with '&'
then there is an "&" somewhere in the name of an XML element. This is not allowed in an XML document. You cannot open an invalid XML file with the XmlDocument (or XDocument) class.
There are several things you can do:
Make sure that the XML files are always valid before trying to read them. This however depends on your scenario and may not be possible.
Preprocess your XML file to fix the invalid content by replacing "&" with "&". You can either do this manually or at run-time.
Use HtmlAgilityPack to parse the invalid file.
Personally, I would go with 1) if possible or 2) otherwise.
Replace all occurances of & with & in the xml.
So after spending hours on this issue: it turns out that if you have an ampersand symbol ("&") or any other XML escape characters within your xml string, it will always fail will you try read the XML. TO solve this, replace the special characters with their escaped string format
YourXmlString = YourXmlString.Replace("'", "&apos;").Replace("\"", """).Replace(">", ">").Replace("<", "<").Replace("&", "&");

Doing regex style compare while looping through a XML file in C#

I have a XML file that i am using to loop through an on matching of a child node getting the value of a an attribute.The thing is matching these values with a * character or ? character like some regex style..can someone tell me how to do this .So if a request comes like g.portal.com it should match the second node .I am using .net 2.0
Below is my XML file
<Test>
<Test Text="portal.com" Sample="1" />
<Test Text="*.portal.com" Sample="201309" />
<Test Text="portal-0?.com" Sample="201309" />
</Test>
XmlDocument xDoc = new XmlDocument();
xDoc.Load(PathToXMLFile);
foreach (XmlNode node in xDoc.DocumentElement.ChildNodes)
{
if (node.Attributes["Sample"].InnerText == value)
{
}
}
What you need to do is first convert each Text attribute into a valid Regex pattern and then use it to match your input. Something like this:
string input = "g.portal.com";
XmlNode foundNode = null;
foreach (XmlNode node in xDoc.DocumentElement.ChildNodes)
{
string value = node.Attributes["Text"].Value;
string pattern = Regex.Escape(value)
.Replace(#"\*", ".*")
.Replace(#"\?", ".");
if (Regex.IsMatch(input, "^" + pattern + "$"))
{
foundNode = node;
break; //remove if you want to continue searching
}
}
After executing the above code, foundNode should contain the second node from the xml file.
So you have an XML file that sets up patterns, right? You'll want to feed those patterns into Regexes and then stream a number of requests through them. Did I get that correct?
Assuming the XML file doesn't change it only needs to be processed into according Regexes. For example *.portal.com would translate to
new Regex("\\w+\\.portal\\.com");
You'll just have to escape the dots, replace * with \\w+ and ? with \\w if i guessed the semantics of you match patterns correctly.
Look up the correct replacements at http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx

Embedding an xml document inside an xml string [duplicate]

This question already has answers here:
Best way to encode text data for XML
(13 answers)
Closed 2 years ago.
I have a web service that returns an xml string as results. The return string is in this format:
<ReturnValue>
<ErrorNumber>0
</ErrorNumber>
<Message>my message</Message>
</ReturnValue>
The data that I want to insert into the "message" tag is a serialized version of a custom object. The serialized format of that object contains xml and namespace declarations post serialization. When that gets thrown into the "message" tag of my return xml string, XmlSpy says that it's not well-formed. How should I get rid of the namespace declarations, or is there a different way to imbed a serialized object into an xml string?
Wrap the string in CDATA like so:
<![CDATA[your xml, which can be multi-line]]>
CDATA will inform a validator to treat the CDATA contents as ignored text. It's often the most expedient way to embed XML (or taggy non-XML content) as a string. You can run into problems if your embedded XML contains its own CDATA, but otherwise it's a simple fix.
Just make sure that your <Message> XML is encoded so that <, >, ", and & show up as <, >, " and &, respectively.
There are few built-in ways to encode the characters:
string message = System.Web.HttpUtility.HtmlEncode(serializedXml);
string message = System.Security.SecurityElement.Escape(serializedXml);
Using an XmlTextWriter to do the work for you
Use CDATA to wrap your XML
Also, this is probably a duplicate of:
Best way to encode text data for XML
Think of XML as a document not a string.
Create a node named "wrapper", and store the content of your file in it as a Base64 encoded string. The results will look like this.
<ReturnValue>
<ErrorNumber>0</ErrorNumber>
<Message>my message</Message>
<wrapper type="bin.base64">PD94bWwgdmVyc2lvbj0iMS4wIj8+PHhzbDpzdHlsZXNoZWV0IHZ
lcnNpb249IjEuMCIgeG1sbnM6eHNsPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L1hTTC9UcmFuc2Zvcm0
iIHhtbG5zOm1zeHNsPSJ1cm46c2NoZW1hcy1taWNyb3NvZnQtY29tOnhzbHQiPjx4c2w6b3V0cHV0IG1
ldGhvZD0ieG1sIiAvPjx4c2w6dGVtcGxhdGUgbWF0Y2g9Ii8iPjwveHNsOnRlbXBsYXRlPjwveHNsOnN
0eWxlc2hlZXQ+</wrapper>
</ReturnValue>
The following code shows how to add the wrapper, encode the content. Then it reverses the process to show that it all "works".
Using Base64 in XML has a number of other applications as well. For example embedding images, or other documents in XML content.
using System;
using System.IO;
using System.Xml;
public class t
{
static public string EncodeTo64(string toEncode) {
byte[] toEncodeAsBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(toEncode);
string returnValue = System.Convert.ToBase64String(toEncodeAsBytes);
return returnValue;
}
static public string DecodeFrom64(string encodedData) {
byte[] encodedDataAsBytes = System.Convert.FromBase64String(encodedData);
string returnValue = System.Text.ASCIIEncoding.ASCII.GetString(encodedDataAsBytes);
return returnValue;
}
public static void Main() {
try {
//Create the XmlDocument.
XmlDocument doc = new XmlDocument();
doc.LoadXml( #"
<ReturnValue>
<ErrorNumber>0</ErrorNumber>
<Message>my message</Message>
</ReturnValue>
");
XmlNode nodeMessage = doc.SelectSingleNode( "/ReturnValue/Message" );
if( nodeMessage != null ) {
XmlDocument docImport = new XmlDocument();
docImport.Load( "docwithnamespace.xml" );
// create a wrapper element for the file, then import and append it after <Message>
XmlElement nodeWrapper = (XmlElement)doc.CreateElement( "wrapper" );
nodeWrapper.SetAttribute( "type", "bin.base64" );
nodeWrapper = (XmlElement)doc.ImportNode( nodeWrapper, true );
XmlNode ndImport = nodeMessage.ParentNode.AppendChild( nodeWrapper.CloneNode( true ) );
ndImport.InnerText = EncodeTo64( docImport.OuterXml );
doc.Save( "wrapperadded.xml" );
// Next, let's test un-doing the wrapping
// Re-load the "wrapped" document
XmlDocument docSaved = new XmlDocument();
docSaved.Load( "wrapperadded.xml" );
// Get the wrapped element, decode from base64 write to disk
XmlNode node = doc.SelectSingleNode( "/ReturnValue/wrapper" );
if( node != null ) {
// Load the content, and save as a new XML
XmlDocument docUnwrapped = new XmlDocument();
docUnwrapped.LoadXml( DecodeFrom64( node.InnerText ) );
docUnwrapped.Save( "unwrapped.xml" );
Console.WriteLine( "Eureka" );
}
}
} catch( Exception e ) {
Console.WriteLine(e.Message);
}
}
}

XmlDocument throwing "An error occurred while parsing EntityName"

I have a function where I am passing a string as params called filterXML which contains '&' in one of the properties.
I know that XML will not recognize it and it will throw me an err. Here is my code:
public XmlDocument TestXMLDoc(string filterXml)
{
XmlDocument doc = new XmlDocument();
XmlNode root = doc.CreateElement("ResponseItems");
// put that root into our document (which is an empty placeholder now)
doc.AppendChild(root);
try
{
XmlDocument docFilter = new XmlDocument();
docFilter.PreserveWhitespace = true;
if (string.IsNullOrEmpty(filterXml) == false)
docFilter.LoadXml(filterXml); //ERROR THROWN HERE!!!
What should I change in my code to edit or parse filterXml? My filterXml looks like this:
<Testing>
<Test>CITY & COUNTY</Test>
</Testing>
I am changing my string value from & to &. Here is my code for that:
string editXml = filterXml;
if (editXml.Contains("&"))
{
editXml.Replace('&', '&');
}
But its giving me an err on inside the if statement : Too many literals.
The file shown above is not well-formed XML because the ampersand is not escaped.
You can try with:
<Testing>
<Test>CITY & COUNTY</Test>
</Testing>
or:
<Testing>
<Test><![CDATA[CITY & COUNTY]]></Test>
</Testing>
About the second question: there are two signatures for String.Replace. One that takes characters, the other that takes strings. Using single quotes attempts to build character literals - but "&", for C#, is really a string (it has five characters).
Does it work with double quotes?
editXml.Replace("&", "&");
If you would like to be a bit more conservative, you could also write code to ensure that the &s you are replacing are not followed by one of
amp; quot; apos; gt; lt; or #
(but this would still not be a perfect filtering)
To specify an ampersand in XML you should use & since the ampersand sign ('&') has a special meaning in XML.

Using XDocument to write raw XML

I'm trying to create a spreadsheet in XML Spreadsheet 2003 format (so Excel can read it). I'm writing out the document using the XDocument class, and I need to get a newline in the body of one of the <Cell> tags. Excel, when it reads and writes, requires the files to have the literal string
embedded in the string to correctly show the newline in the spreadsheet. It also writes it out as such.
The problem is that XDocument is writing CR-LF (\r\n) when I have newlines in my data, and it automatically escapes ampersands for me when I try to do a .Replace() on the input string, so I end up with &#10; in my file, which Excel just happily writes out as a string literal.
Is there any way to make XDocument write out the literal
as part of the XML stream? I know I can do it by deriving from XmlTextWriter, or literally just writing out the file with a TextWriter, but I'd prefer not to if possible.
I wonder if it might be better to use XmlWriter directly, and WriteRaw?
A quick check shows that XmlDocument makes a slightly better job of it, but xml and whitespace gets tricky very quickly...
I battled with this problem for a couple of days and finally came up with this solution. I used XMLDocument.Save(Stream) method, then got the formatted XML string from the stream. Then I replaced the &#10; occurrences with
and used the TextWriter to write the string to a file.
string xml = "<?xml version=\"1.0\"?><?mso-application progid='Excel.Sheet'?><Workbook xmlns=\"urn:schemas-microsoft-com:office:spreadsheet\" xmlns:o=\"urn:schemas-microsoft-com:office:office\" xmlns:x=\"urn:schemas-microsoft-com:office:excel\" xmlns:ss=\"urn:schemas-microsoft-com:office:spreadsheet\" xmlns:html=\"http://www.w3.org/TR/REC-html40\">";
xml += "<Styles><Style ss:ID=\"s1\"><Alignment ss:Vertical=\"Center\" ss:WrapText=\"1\"/></Style></Styles>";
xml += "<Worksheet ss:Name=\"Default\"><Table><Column ss:Index=\"1\" ss:AutoFitWidth=\"0\" ss:Width=\"75\" /><Row><Cell ss:StyleID=\"s1\"><Data ss:Type=\"String\">Hello&#10;&#10;World</Data></Cell></Row></Table></Worksheet></Workbook>";
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.LoadXml(xml); //load the xml string
System.IO.MemoryStream stream = new System.IO.MemoryStream();
doc.Save(stream); //save the xml as a formatted string
stream.Position = 0; //reset the stream position since it will be at the end from the Save method
System.IO.StreamReader reader = new System.IO.StreamReader(stream);
string formattedXML = reader.ReadToEnd(); //fetch the formatted XML into a string
formattedXML = formattedXML.Replace("&#10;", "
"); //Replace the unhelpful &#10;'s with the wanted endline entity
System.IO.TextWriter writer = new System.IO.StreamWriter("C:\\Temp\test1.xls");
writer.Write(formattedXML); //write the XML to a file
writer.Close();

Categories