Remove "&" from input stream when generating XML - c#

HttpContext.Current.Response.ContentType = "text/xml";
HttpContext.Current.Response.ContentEncoding = Encoding.UTF8;
HttpPostedFile file = HttpContext.Current.Request.Files[0];
// If file.InputSteam contains an "&", exception is thrown
XDocument doc = XDocument.Load(XmlReader.Create(file.InputStream));
HttpContext.Current.Response.Write(doc);
Is there any way to replace & with & before generating the xml document? My current code crashes whenever the file contains a &.
Thanks

Your code will only crash if it's not valid XML. For example, this should be fine:
<foo>A & B</foo>
If you've actually got
<foo>A & B</foo>
Then you haven't got an XML file. You may have something which looks a bit like XML, but it isn't really valid XML.
The best approach here isn't to transform the data on the fly - it's to fix the source of the data so that it's real XML. There's really no excuse for anything producing invalid XML in this day and age.
Additionally, there's no reason to use XmlReader.Create here - just use
XDocument doc = XDocument.Load(file.InputStream);

use HttpEncoder.HtmlEncode()
http://msdn.microsoft.com/en-us/library/system.web.util.httpencoder.aspx

you can use "& amp;" to escape "&".
In xml document, there are some characters should be escaped.
& ---- &
< ---- <
> ---- >
" ---- "
' ---- &apos;

Related

how to edit utf-16 xml file if it have string line after the end of the main Node

I have Special XML file with utf-16 encoding type. this file used to store data and I need to Edit it Using C# windows forms Application
<?xml version="1.0" encoding="utf-16"?>
<cProgram xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" ID="b0eb0c7e-f4de-4bc7-9e62-7a086a8c2fn8" Version="16.01" xmlns="cProgram">
<Serie>N </Serie>
<No>123456</No>
<type>101</type>
<Dataset4>larg data here 2 million char</Dataset4>
</cProgram>123456FF896631N 4873821012013-06-14
the problem is: it is not ordinary XML file
Because at the very End of the file I have a string line too, and that would give this error
Data at the root level is invalid. Line x, position x
when I try to load it as xml file
I tried to temporary replace the last line and get it back after I change the inner text, and it works But I lost the declaration Line and I didn't find a way to rewrite it when I have that text at the end of the file !_
so I need to change the InnerText of (Serie) and (No) nodes
but I don't Want to lose the declaration Line or the string text at the end of the file
try this piece of code:
string line = "";
string[] stringsperate = new string[] { "</cProgram>" };
using (StreamReader sr = new StreamReader("C://blah.xml"))
{
line = sr.ReadToEnd();
Console.WriteLine(line);
}
string text = line.Split(stringsperate, StringSplitOptions.None)[0];
text += "</cProgram>";
XmlDocument xd = new XmlDocument();
xd.LoadXml(text);
Console.Read();
Hope this helps
XDocument.Save() should persist the XML declaration line if the declaration exists initially. I also checked with your XML and the declaration line saved as expected :
var xml = #"<?xml version=""1.0"" encoding=""utf-16""?>
<cProgram xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema"" ID=""b0eb0c7e-f4de-4bc7-9e62-7a086a8c2fn8"" Version=""16.01"" xmlns=""cProgram"">
<Serie>N </Serie>
<No>123456</No>
<type>101</type>
<Dataset4>larg data here 2 million char</Dataset4>
</cProgram>";
var doc = XDocument.Parse(xml);
doc.Save("test.xml");
So you can implement your idea to temporarily replacing the last line and get it back after changing the inner text.
Fyi, XDocument's .ToString() method doesn't write XML declaration line, but .Save() method does. Question related to this : How to print <?xml version="1.0"?> using XDocument
allow me to answer my question
when I used doc.Load(filepath); it always give Error cause of the disturbing last Line
and C# use UTF-8 as defaults to work with xml files.But in this question it is UTF-16
So I found a very short way to do this & replace innertext with string as I want
string text = File.ReadAllText(filepath);
text = text.Replace("<Serie>N", "<Serie>"+textBox1.Text);
text = text.Replace("<Nom>487382","<Nom>"+textBox2.Text);
//saving file with UTF-16
File.WriteAllText("new.xml", text , Encoding.Unicode);
Question related to this [blog]: How to save this string into XML file? "it is much more answer related than being Question related"

Problems with XSLT and Special Characters

On my web app (ASP.net 4,C#) I use FOR XML PATH('') to convert Data from SQL Server to XML,
and use the following lines to feed it to XSLT:
MemoryStream stream = new MemoryStream(UTF8Encoding.UTF8.GetBytes(xml));
XPathDocument document = new XPathDocument(stream);
StringWriter writer = new StringWriter();
XslCompiledTransform transform = new XslCompiledTransform();
transform.Load(xsltPath);
transform.Transform(document, null, writer);
return writer.ToString();
Now when I feed messages from my forum, in sunny day scenarios, there should be no problem at all and there isn't.
When a user decides to use special characters like < > in their messages thought, there we have the rainy day.
I get an error which by the way differs from time to time (From message to message depending on what they write there).
I have already tried disable-output-escaping="yes"
Needless to say, I want the users to be able to use some tags like
<a href... or <font ...>
Below is an example of one of the messages that causes the issue:
setting-->about phone----< software update
Any possible solutions?
You need to encode such special characters. As far as XML is concerned, there are 5 of them:
> - >
< - <
& - &
" - "
' - &apos;
You need to encode these from the use input.
An alternative is to place all user generated content within <!\[CDATA\[\]\]> sections, which effectively achieves the same.

Is it possible to escape & while loading xml?

I have an & character in one of the xml nodes as below.
<dependents>9 & 5</dependents>
When I try to load the file as below, it is giving an error "An error occured while parsing EntityName.". Is it possible to escape this character and load successfully? Thank you.
m_InputXMLDoc = new XmlDocument();
if (System.IO.File.Exists(InputFile))
{
m_InputXMLDoc.Load(InputFile);
}
Your XML is invalid.
You need to change it to &.
Use a CDATA section
<dependents><![CDATA[9 & 5]]></dependents>

How to handle xml that contains nested xml using c# xmlreader?

I'm using c# to interact with a database that has an exposed REST API. The table that I'm interested in contains forum posts, some of which themselves contain xml.
Whenever my result set contains a post that has xml, my application throws an error as follows:
Exception Details: System.Xml.XmlException: '>' is an unexpected token. The expected token is '"' or '''. Line 1, position 62.
And this is the line that fails:
Line 44: ds.ReadXml(xmlData);
And this is the code I'm using:
var webClient = new WebClient();
string searchString = searchValue.Text;
string requestUrl = "http://myserver/restapi.ashx/search.xml?pagesize=4&pageindex=0&query=";
requestUrl += searchString;
XmlReaderSettings settings = new XmlReaderSettings();
settings.ProhibitDtd = false;
XmlReader xmlData = XmlReader.Create(webClient.OpenRead(requestUrl),settings);
DataSet ds = new DataSet();
ds.ReadXml(xmlData);
Repeater1.DataSource = ds.Tables[1];
Repeater1.DataBind();
And this is the type of XML record that it's choking on (the stuff in the node is causing the problem):
<SearchResults PageSize="1" PageIndex="0" TotalCount="342">
<SearchResult>
<ContentId>994</ContentId>
<Title>Help Files: What are they written in?</Title>
<Url>http://myserver/linktest.aspx</Url>
<Date>2008-10-16T16:18:00+01:00</Date><ContentType>post</ContentType>
<Body><div class="ForumPostBodyArea"> <div class="ForumPostContentText"> <p>Can anyone see anything obviously wrong with this xml, when its fired to CRM Its creating 13 null records.</p> <p><?xml version="1.0" encoding="UTF-8"?><soap:Envelope xmlns:typens="http://tempuri.org/type" soap:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:wsdlns="http://tempuri.org/wsdl/" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Header><SessionHeader><sessionId xsi:type="xsd:long">18208442035524</sessionId></SessionHeader></soap:Header><soap:Body><typens:add><entityname xsi:type="xsd:string">lead</entityname><records xsi:nil="true" xsi:type="typens:ewarebase" /><status xsi:type="xsd:string">PreRegistration</status><requester xsi:type="xsd:string">Mimnagh</requester><personfirstname xsi:type="xsd:string">Sean</personfirstname><personlastname xsi:type="xsd:string">Test2</personlastname><personsalutation xsi:type="xsd:string">Mr</personsalutation><details xsi:type="xsd:string">test project details</details><description xsi:type="xsd:string">test description details</description><comments xsi:type="xsd:string">test project comments</comments><personemail xsi:type="xsd:string">smimnagh#mac.com</personemail><personphonenumber xsi:type="xsd:string">12334566777</personphonenumber><type xsi:type="xsd:string">PreReg</type><companyname xsi:type="xsd:string">Site Client</companyname></typens:add></soap:Body></soap:Envelope></p> <p>Many thanks</p> </div> </div>
</Body>
<Tags>
<Tag>xml</Tag>
</Tags>
<IndexedAt>2010-07-08T11:53:46.848+01:00</IndexedAt>
</SearchResult>
</SearchResults>
Is there something that I can do with the xmlreader to make it ignore whatever's causing the problem?
Please note that I can't change the XML prior to consuming it - so if it's malformed then I wonder if there's a way to ignore or modify that particular record without generating an error?
Thanks!
It looks like some of your quotes need escaping in the contents of some of your elements. Try using
"
for quote marks that aren't wrapping attribute values.
UPDATE:
Because the data you want to read isn't strictly XML (it's nearly XML) you're best bet is to
Either you or your boss, if you have one, screams at the third party because they're not sending you well formed XML.
Perform some horrible hack to try and convert whatever you might get to XML.
If you have to go with point 2, the simplest thing that pops into my head is to read the characters of the 'XML' counting in and out of angle brackets. If you find any " characters and you're not within any angle brackets, replace the " with
"
But note that doing that is a complete last resort.
The Content of your <Body> tag is not well formed. XML is very strict with the syntax of data. Either embed a CDATA section into your XML or escape the string properly.

System.IO.File.ReadAllText(path) does not read the html file

I want to read the html file.And for that I use System.IO.File.ReadAllText(path).It can read all the html file but there is one file which is not read through this function.
I have also used
using (StreamReader reader = File.OpenText(fileName)) {
text = reader.ReadToEnd(); But still there is same problem.
What is the reason can be there ? And for that what can be the solution ? Or any other way to read the file ?
I'll take a wild guess:
The file contains unicode sequences for extended chars and the diagnose is based on (mismatched) length.
if I debug the code in the it looks
like
"<\0h\0t\0m\0l\0>\0<\0h\0e\0a\0d\0>\0\r\0\n\0<\0M\0E\0T\0A\0
\0h\0t\0t\0p\0-\0e\0q\0u\0i\0v\0=\0\"\0C\0o\0n\0t\0e\0n
Which is a valid beginning of a HTML file except for the very first char. The file is probably damaged by missing a unicode marker at the start. This damage was probably caused when it was written and is not (easy) repairable now.
You could try setting the WebClient.Encoding to UTF8 (and try a few ASCII as well).
Does MsgBox shows anything? Any error? What does varText.Length show?
string varText = File.ReadAllText(varFile, Encoding.Default);
MessageBox.Show(varFile + " Text: " + varText + " Lenght: " + varText.Length);
Verify in MessageBox that the path to file is correct, verify that the access rights from inside your application are the same as if you would be reading the file with notepad.
Came across this on google recently. The correct way to do it is via WebClient...
WebClient client = new WebClient();
String guestMsg = client.DownloadString("C:\\temp\\TheBarGuestDetailsEmail.htm");
File.ReadAllText will mess up the html when it's doing a read, and characters like £ or ' will get messed up.

Categories