an error occurred while parsing entityname with '&' - c#

I have the next program that open a .XML document with Visual c#. I can´t open the Xml because it has a '&', and I don´t know how i can open.
private void button1_Click(object sender, EventArgs e)
{
XmlDocument doc;
doc = new XmlDocument();
doc.Load("nuevo.xml");
XmlNodeList menus;
menus = doc.GetElementsByTagName("menu");
foreach (XmlNode unMenu in menus)
{
if (unMenu.Attributes["precio"].Value == "50")
{
//Console.WriteLine(unMenu.Attributes["type"].Value);
XPathNavigator navegador = doc.CreateNavigator();
XPathNodeIterator nodos = navegador.Select("/restaurante");
while (nodos.MoveNext())
{
Console.WriteLine(nodos.Current.OuterXml);
Console.WriteLine();
textBox1.Text = nodos.Current.OuterXml;
}
}
}
}

If you get the error
an error occurred while parsing entityname with '&'
then there is an "&" somewhere in the name of an XML element. This is not allowed in an XML document. You cannot open an invalid XML file with the XmlDocument (or XDocument) class.
There are several things you can do:
Make sure that the XML files are always valid before trying to read them. This however depends on your scenario and may not be possible.
Preprocess your XML file to fix the invalid content by replacing "&" with "&". You can either do this manually or at run-time.
Use HtmlAgilityPack to parse the invalid file.
Personally, I would go with 1) if possible or 2) otherwise.

Replace all occurances of & with & in the xml.

So after spending hours on this issue: it turns out that if you have an ampersand symbol ("&") or any other XML escape characters within your xml string, it will always fail will you try read the XML. TO solve this, replace the special characters with their escaped string format
YourXmlString = YourXmlString.Replace("'", "&apos;").Replace("\"", """).Replace(">", ">").Replace("<", "<").Replace("&", "&");

Related

How can I preserve entity characters using Xml Document?

My question is simple, but I just can't find why I have this problem and can't resolve it.
I need to read a XML file with values and use them on Unity. For now on, I read my document with its path :
XmlDocument doc = new XmlDocument();
doc.Load(path);
XmlElement root = doc.DocumentElement;
I have a Namespace Manager already configured.
I read my data like this :
string text = node.SelectSingleNode("x:textRuns/x:DOMTextRun/x:characters", nsmgr).InnerText.Replace("
", Environment.NewLine);
My XML and the data I would like to extract :
<characters>Third occupant
folding seat</characters>
My objective is to replace this entity character : "& #xD;" with an Environment.NewLine.
I tried to :
Formalize the Xml in a file with a replace
Read with an InnerText, and an InnerXml
Make an entity char "detector"
Get the node with all its content (OuterXML)
It looks like this char, however you read it, is exclude and not readable, I just can't have it on my console.
The entity has already been replaced once you extracted InnerText. Problem is, you have a CR (carriage return; 0x0D, \r) instead of a LF (line feed; 0x0A, \n). So replace "\r" by Environment.NewLine:
public static void Main() {
XmlDocument doc = new XmlDocument();
doc.LoadXml("<characters>Third occupant
folding seat</characters>");
string text = doc.SelectSingleNode("/characters").InnerText;
text = text.Replace("\r", Environment.NewLine);
Console.WriteLine(text);
}

Doing regex style compare while looping through a XML file in C#

I have a XML file that i am using to loop through an on matching of a child node getting the value of a an attribute.The thing is matching these values with a * character or ? character like some regex style..can someone tell me how to do this .So if a request comes like g.portal.com it should match the second node .I am using .net 2.0
Below is my XML file
<Test>
<Test Text="portal.com" Sample="1" />
<Test Text="*.portal.com" Sample="201309" />
<Test Text="portal-0?.com" Sample="201309" />
</Test>
XmlDocument xDoc = new XmlDocument();
xDoc.Load(PathToXMLFile);
foreach (XmlNode node in xDoc.DocumentElement.ChildNodes)
{
if (node.Attributes["Sample"].InnerText == value)
{
}
}
What you need to do is first convert each Text attribute into a valid Regex pattern and then use it to match your input. Something like this:
string input = "g.portal.com";
XmlNode foundNode = null;
foreach (XmlNode node in xDoc.DocumentElement.ChildNodes)
{
string value = node.Attributes["Text"].Value;
string pattern = Regex.Escape(value)
.Replace(#"\*", ".*")
.Replace(#"\?", ".");
if (Regex.IsMatch(input, "^" + pattern + "$"))
{
foundNode = node;
break; //remove if you want to continue searching
}
}
After executing the above code, foundNode should contain the second node from the xml file.
So you have an XML file that sets up patterns, right? You'll want to feed those patterns into Regexes and then stream a number of requests through them. Did I get that correct?
Assuming the XML file doesn't change it only needs to be processed into according Regexes. For example *.portal.com would translate to
new Regex("\\w+\\.portal\\.com");
You'll just have to escape the dots, replace * with \\w+ and ? with \\w if i guessed the semantics of you match patterns correctly.
Look up the correct replacements at http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx

How to correctly encode & in xml?

Im web-requsting an XML document. Xdocument.Load(stream) throws an exception because the XML contains &, and therefore expects ; like &.
I did read the stream to string and replaced & with &, but that broke all other correctly encoded special chars like ø.
Is there a simple way to encode all disallowed chars in the string before parsing to XDocument?
Try CDATA Sections in xml
A CDATA section can only be used in places where you could have a text node.
<foo><![CDATA[Here is some data including < , > or & etc) ]]></foo>
This kind of methods are not encouraged!! The reason lies in your question!
(replacing & by & turns > to &gt;)
The better suggestion apart from using regex is modifying your source code which is generating such uncoded XML.
I have come across (.NET) code that use 'string concat' to come up with XML! (Instead one should use XML-DOM)
If you have an access to modify the source code then better go head with that .. because encoding such half-encoded XML is not promised with perfection!
#espvar,
This is an input XML:
<root><child>nospecialchars</child><specialchild>data&data</specialchild><specialchild2>You.. & I in this beautiful world</specialchild2>data&</root>
And the Main function:
string EncodedXML = encodeWithCDATA(XMLInput); //Calling our Custom function
XmlDocument xdDoc = new XmlDocument();
xdDoc.LoadXml(EncodedXML); //passed
The function encodeWithCDATA():
private string encodeWithCDATA(string stringXML)
{
if (stringXML.IndexOf('&') != -1)
{
int indexofClosingtag = stringXML.Substring(0, stringXML.IndexOf('&')).LastIndexOf('>');
int indexofNextOpeningtag = stringXML.Substring(indexofClosingtag).IndexOf('<');
string CDATAsection = string.Concat("<![CDATA[", stringXML.Substring(indexofClosingtag, indexofNextOpeningtag), "]]>");
string encodedLeftPart = string.Concat(stringXML.Substring(0, indexofClosingtag+1), CDATAsection);
string UncodedRightPart = stringXML.Substring(indexofClosingtag+indexofNextOpeningtag);
return (string.Concat(encodedLeftPart, encodeWithCDATA(UncodedRightPart)));
}
else
{
return (stringXML);
}
}
Encoded XML (ie, xdDoc.OuterXml):
<root>
<child>nospecialchars</child>
<specialchild>
<![CDATA[>data&data]]>
</specialchild>
<specialchild2>
<![CDATA[>You.. & I in this beautiful world]]>
</specialchild2>
<![CDATA[>data&]]>
</root>
All I have used is, substring, IndexOf, stringConcat and recursive function call.. Let me know if you don't understand any part of the code.
The sample XML that I have provided possess data in the parent nodes as well, which is kind of HTML property .. ex: <div>this is <b>bold</b> text</div>.. and my code takes care of encoding data outside <b> tag if they have special character ie, &..
Please note that, I have taken care of encoding '&' only and .. data cannot have chars like '<' or '>' or single-quote or double-quote..

The error in getting the exact value from the XML node when the '\' value is in string and that string in passed used as xml instead of file

I am having XML in a String as below
String s = #<user>abc.int\abhi</user>
but when i write the following code
XmlDocument doc = new XmlDocument();
doc.InnerXml = s;
XmlElement root = doc.DocumentElement;
String User = root.SelectSingleNode("user");
The User has the value abc.int\\abhi instead of abc.int\abhi the '\' character appears twice in the string.
Thank you in advance.
Do you check that value in VS watch window? If so, it is normal to display \, because watch window shows string as if it was written in code, not the real string.
In code, if you want to enter \ into a string, you have to write string s = "\\"; And this will create actual string with \ in it.
try outputting your string to console or messagebox, and you should see, that it is correct.

XmlDocument throwing "An error occurred while parsing EntityName"

I have a function where I am passing a string as params called filterXML which contains '&' in one of the properties.
I know that XML will not recognize it and it will throw me an err. Here is my code:
public XmlDocument TestXMLDoc(string filterXml)
{
XmlDocument doc = new XmlDocument();
XmlNode root = doc.CreateElement("ResponseItems");
// put that root into our document (which is an empty placeholder now)
doc.AppendChild(root);
try
{
XmlDocument docFilter = new XmlDocument();
docFilter.PreserveWhitespace = true;
if (string.IsNullOrEmpty(filterXml) == false)
docFilter.LoadXml(filterXml); //ERROR THROWN HERE!!!
What should I change in my code to edit or parse filterXml? My filterXml looks like this:
<Testing>
<Test>CITY & COUNTY</Test>
</Testing>
I am changing my string value from & to &. Here is my code for that:
string editXml = filterXml;
if (editXml.Contains("&"))
{
editXml.Replace('&', '&');
}
But its giving me an err on inside the if statement : Too many literals.
The file shown above is not well-formed XML because the ampersand is not escaped.
You can try with:
<Testing>
<Test>CITY & COUNTY</Test>
</Testing>
or:
<Testing>
<Test><![CDATA[CITY & COUNTY]]></Test>
</Testing>
About the second question: there are two signatures for String.Replace. One that takes characters, the other that takes strings. Using single quotes attempts to build character literals - but "&", for C#, is really a string (it has five characters).
Does it work with double quotes?
editXml.Replace("&", "&");
If you would like to be a bit more conservative, you could also write code to ensure that the &s you are replacing are not followed by one of
amp; quot; apos; gt; lt; or #
(but this would still not be a perfect filtering)
To specify an ampersand in XML you should use & since the ampersand sign ('&') has a special meaning in XML.

Categories