Minor issue, but it's driving me nuts nonetheless.
I'm building a url for a <script> tag include to be rendered on an ASP.NET page, something like this:
<script src='<%= string.Format("http://example.com/page.aspx?a={0}&b={1}&c={2:0.00}", A, B, C)%>' type='text/javascript'></script>
Problem is when this is rendered, the & characters are replaced with &:
<script src='http://example.com/page.aspx?a=xxx&b=zzz&c=123.45' type='text/javascript'></script>
I was expecting this, obviously:
<script src='http://example.com/page.aspx?a=xxx&b=zzz&c=123.45' type='text/javascript'></script>
However, if I render the url directly, outside the <script> tag, it looks ok! Just doing
<%= string.Format("http://example.com/page.aspx?a={0}&b={1}&c={2:0.00}", A, B, C) %>
Renders this:
http://example.com/page.aspx?a=xxx&b=zzz&c=123.45
What gives? And how do I stop this madness? My OCD can't take it!
As #Falkon and #AVD have said, ASP.NET is automatically doing the "right" thing in the <script> tag. See the w3c recommendation - C.12. Using Ampersands in Attribute Values (and Elsewhere)
In order to ensure that documents are compatible with historical HTML user agents and XML-based user agents, ampersands used in a document that are to be treated as literal characters must be expressed themselves as an entity reference (e.g. "&").
I'm not entirely sure why ASP.NET doesn't do the same thing in the rest of the page (could be any number of good reasons), but at least it's correcting the ampersand in the script tag. Conclusion: While you may be cursing ASP.NET for "scrambling" your url, you may want to thank it instead for helping your webpage be standards compliant.
Maybe MvcHtmlString.Create() or Html.Raw()?
<script src='<%= MvcHtmlString.Create("http://example.com/page.aspx?a={0}&b={1}&c={2:0.00}", A, B, C)%>' type='text/javascript'></script>
or
<script src='<%= Html.Raw("http://example.com/page.aspx?a={0}&b={1}&c={2:0.00}", A, B, C)%>' type='text/javascript'></script>
I can work it out. I just make a method:
public void BuildUrl(String baseUrl = "", String data = "")
{
Response.Write(baseUrl + data);
}
and use it in my html page like this:
<button type="button" class="btn_new" ref="<% this.BuildUrl(this.BaseUrl + "Master/Tanker_Edit.aspx?", "type=new&unique_id=" + Session.SessionID); %>">New</button>
The result:
<button type="button" class="btn_new" ref="http://kideco.local/Master/Tanker_Edit.aspx? type=new&unique_id=emxw1pkpnwcxpn1cl2cf04zv">
Around the comment use Server.HtmlEncode( yourString )
It will automatically escape the double or single quotes for you, as well as ampersands (&) and less-than and greater-than signs, etc.
Instead of <%=, which can sometimes automatically HTMLEncode things it writes to the response stream, try using <$:. This is a new(ish) code expression nugget syntax added with ASP.Net 4.0. This new syntax will still HTMLEncode most things by default, but there is a special IHtmlString interface you can use to explicitly tell this new nugget that you do not want to HTMLEncode this data, and thus avoid double encoding. You should pretty much always use the newer <%: and pretty much never use the older <%=... though of course there will be exceptions to this.
More details are available here:
http://weblogs.asp.net/scottgu/archive/2010/04/06/new-lt-gt-syntax-for-html-encoding-output-in-asp-net-4-and-asp-net-mvc-2.aspx
"&" is a reserved character in HTML and XML, and consequently ASP.NET. The "&" gets converted by ASP.NET to & because that is the code to display that character on the web.
You might find your answer in the answers on this question : ASP.Net URLEncode Ampersand for use in Query String
Hope that helps, good luck!
Related
I have some data from a lookup like this: =winz\ach'dull.
How can I replace single quotes (') with ("").
This is my code =>
<input type="button" id="btnSelect" onclick="Select('<%#Eval("LoginName").ToString().Replace("'", "\'")%>');" value="Select"/>
I'm trying to create code like this:
Select('<%#Eval("LoginName").ToString().Replace("'", "\'")%>');
but it does not not work.
Please correct and help me. Thanks.
In pure javascript we could do :
var a="winz\ach'dull.";
alert(a.replace("'",'"'));
And that would replace your single quote.
Note: Your code is C# not javascript.
You can escape quotes with the "\" character and it works perfectly with HTML. So the answer to exactly what you wrote would be: (this is just to humour you in the future)
"Select('<%#Eval(\"LoginName\").ToString().Replace(\"'\", \"\'\")%>');"
But you have syntax errors in what you are writing and that Eval stuff is not javascript so I don't know why ToString and Replace are attached to it. I've changed it a little based on guessing what you're trying to do:
<input onclick="Select('<%#Eval("LoginName")%>').ToString().Replace(\"'\", \"'\");">
Note that if you're using C# or something on the server side it doesn't need to be escaped because by the time the HTML is parsed in the DOM, typically a browser the source no longer contains your server side code and only the output!
Why doesn't this work?
<input type="button" id="btnAccept" value="Accept" onclick='<%# String.Format("accept('{0}','{1}','{2}','{3}-{4}');", Container.DataItem("PositionID"), Container.DataItem("ApplicantID"), Container.DataItem("FullName"), Container.DataItem("DepartmentName"), Container.DataItem("PositionTitle"))%>' />
The onclick doesn't do anything.
Your best bet is to look at the generated HTML. I think it's a really good habit to check the generated HTML in text format and how it renders on-screen, all the time. Besides errors such as this (which can easily be spotted in the generated HTML), it will help you catch other possible invalid uses of HTML which may render as intended in one browser while rendering terribly in another. HTML rendering engines employ many tricks to try and make invalid HTML look okay.
Anyway, all things aside (such as, assuming accept(...) exists, and all other calls in the tag are correct) I think the issue you are having is as follows:
onclick='<%# String.Format("accept('{0}','{1}','{2}','{3}-{4}');", ... )%>'
This line is probably going to evaluate to look something like this:
onclick='accept('{0}','{1}','{2}','{3}-{4}');'
With all single quotes, all the onclick attribute will see is onclick='accept(' which is not a valid javascript method call. You're going to want to use the "" strings, which you can embed in the format string by escaping them.
String.Format("accept(\"{0}\",\"{1}\",\"{2}\",\"{3}-{4}\");", ... )
Then, you should be able to get the correct combination of ' and " within the attribute:
onclick='accept("{0}","{1}","{2}","{3}-{4}");'
I am having trouble removing all javascript from a HTML page with C#. I have three regex expressions that remove a lot but miss a lot too. Parsing the javascript with the MSHTML DOM parser causes the javascript to actually run, which is what I am trying to avoid by using the regex.
"<script.*/>"
"<script[^>]*>.*</script>"
"<script.*?>[\\s\\S]*?</.*?script>"
Does anyone know what I am missing that is causing these three regex expressions to miss blocks of JavaScript?
An example of what I am trying to remove:
<script src="do_files/page.js" type="text/javascript"></script>
<script src="do_files/page.js" type="text/javascript" />
<script type="text/javascript">
<!--
var Time=new Application('Time')
//-->
</script>
<script type="text/javascript">
if(window['com.actions']) {
window['com.actions'].approvalStatement = "",
window['com.actions'].hasApprovalStatement = false
}
</script>
I assume you are trying to simply sanitize the input of JavaScript. Frankly I'm worried that this is too simple of a solution, 'cuz it seems so incredibly simple. See below for reasoning, after the expression (in a C# string):
#"(?s)<script.*?(/>|</script>)"
That's it - I hope! (It certainly works for your examples!)
My reasoning for the simplicity is that the primary issue with trying to parse HTML with regex is the potential for nested tags - it's not so much the nesting of DIFFERENT tags, but the nesting of SYNONYMOUS tags
For example,
<b> bold <i> AND italic </i></b>
...is not so bad, but
<span class='BoldText'> bold <span class='ItalicText'> AND italic </span></span>
would be much harder to parse, because the ending tags are IDENTICAL.
However, since it is invalid to nest script tags, the next instance of />(<-is this valid?) or </script> is the end of this script block.
There's always the possibility of HTML comments or CDATA tags inside the script tag, but those should be fine if they don't contain </script>. HOWEVER: if they do, it would definitely be possible to get some 'code' through. I don't think the page would render, but some HTML parsers are amazingly flexible, so ya never know. to handle a little extra possible whitespace, you could use:
#"(?s)<\s?script.*?(/\s?>|<\s?/\s?script\s?>)"
Please let me know if you can figure out a way to break it that will let through VALID HTML code with run-able JavaScript (I know there are a few ways to get some stuff through, but it should be broken in one of many different ways if it does get through, and should not be run-able JavaScript code.)
It is generally agreed upon that trying to parse HTML with regex is a bad idea and will yield bad results. Instead, you should use a DOM parser. jQuery wraps nicely around the browser's DOM and would allow you to very easily remove all <script> tags.
ok I have faced a similar case, when I need to clean "rich text" (text with HTML formatting) from any possible javascript-ing.
there are several ways to add javascript to HTML:
by using the <script> tag, with javascript inside it or by loading a javascript file using the "src" attribue.
ex: <script>maliciousCode();</script>
by using an event on an HTML element, such as "onload" or "onmouseover"
ex: <img src="a.jpg" onload="maliciousCode()">
by creating a hyperlink that calls javascript code
ex: <a href="javascript:maliciousCode()">...
This is all I can think of for now.
So the submitted HTML Code needs to be cleaned from these 3 cases. A simple solution would be to look for these patterns using Regex, and replace them by "" or do whatever else you want.
This is a simple code to do this:
public static string CleanHTMLFromScript(string str)
{
Regex re = new Regex("<script[^>]*>", RegexOptions.IgnoreCase);
str = re.Replace(str, "");
re = new Regex("<[a-z][^>]*on[a-z]+=\"?[^\"]*\"?[^>]*>", RegexOptions.IgnoreCase);
str = re.Replace(str, "");
re = new Regex("<a\\s+href\\s*=\\s*\"?\\s*javascript:[^\"]*\"[^>]*>", RegexOptions.IgnoreCase);
str = re.Replace(str, "");
return(str);
}
This code takes care of any spaces and quotes that may or may not be added. It seems to be working fine, not perfect but it does the trick. Any improvements are welcome.
Creating your own HTML parser or script detector is a particularly bad idea if this is being done to prevent cross-site scripting. Doing this by hand is a Very Bad Idea, because there are any number of corner cases and tricks that can be used to defeat such an attempt. This is termed "black listing", as it attempts to remove the unsafe items from HTML, and it's pretty much doomed to failure.
Much safer to use a white list processor (such as AntiSamy), which only allows approved items through by automatically escaping everything else.
Of course, if this isn't what you're doing then you should probably edit your question to give some more context...
Edit:
Now that we know you're using C#, try the HTMLAgilityPack as suggested here.
Which language are you using? As a general statement, Regular Expressions are not suitable for parsing HTML.
If you are on the .net Platform, the HTML Agility Pack offers a much better parser.
You should use a real html parser for the job. That being said, for simple stripping
of script blocks you could use a rudimentary regex like below.
The idea is that you will need a callback to determine if capture group 1 matched.
If it did, the callback should pass back things that hide html (like comments) back
through unchanged, and the script blocks are passed back as an empty string.
This won't substitute for an html processor though. Good luck!
Search Regex: (modifiers - expanded, global, include newlines in dot, callback func)
(?:
<script (?:\s+(?:".*?"|\'.*?\'|[^>]*?)+)? \s*> .*? </script\s*>
| </?script (?:\s+(?:".*?"|\'.*?\'|[^>]*?)+)? \s*/?>
)
|
( # Capture group 1
<!(?:DOCTYPE.*?|--.*?--)> # things that hide html, add more constructs here ...
)
Replacement func pseudo code:
string callback () {
if capture buffer 1 matched
return capt buffer 1
else return ''
}
I'm looking for a regex that will allow me to get all javscript and css link tags in a string so that I can strip certain tags from a DotNetNuke (Yeah I know.... ouch!) page on an overridden render event.
I know about the html agility pack i've even read Jeff Atwoods blog entry but unfortunately I don't have the luxury of a 3rd party library.
Any help would be appreciated.
Edit, I gave this a try to get a javascript entry but it didn't work. Regex's are a dark art to me.
updatedPageSource = Regex.Replace(
pageSource,
String.Format("<script type=\"text/javascript\" src=\".*?{0}\"></script>",
name), "", RegexOptions.IgnoreCase);
I have a few comments on this, your RegEx is close, the following has been tested to work
<script type="text/javascript" src=".*myfile.js"></script>
I used the following test inputs
<script type="text/javascript" src="myfile.js"></script>
<script type="text/javascript" src="/test/myfile.js"></script>
<script type="text/javascript" src="/test/Looky/myfile.js"></script>
However, I would caution on this approach, and it does take time to parse, can be error prone, etc...
DISCLAIMER: Regex + HTML = ouch!
Your problem may be that you are not escaping the Regex metacharacters from name (e.g. the dot metacharacter '.'). You may want to try this:
updatedPageSource = Regex.Replace(
pageSource,
String.Format("<script\\s+type=\"text/javascript\"\\s+src=\".*?{0}\"\\s*>\\s*</script>", Regex.Escape(name)),
"",
RegexOptions.IgnoreCase);
// Just one of the many reasons why you don't mix Regex with HTML:
updatedPageSource = Regex.Replace(
updatedPageSource,
String.Format("<script\\s+src=\".*?{0}\"\\s+type=\"text/javascript\"\\s*>\\s*</script>", Regex.Escape(name)),
"",
RegexOptions.IgnoreCase);
I also added optional whitespace here and there.
Don't forget to account for things like whitespace, other attributes, different orders of attributes (i.e. src="foo" type="bar" vs type="bar" src="foo"), and " vs ' quoting. Maybe this?
#"<\s*script\b.*?\bsrc=(""|').*?{0}\1\b.*?(/>|>\s*</\s*script\s*>)"
I went ahead and took out the type attribute. If you have the filename, you know what type of script it is anyway; plus, this accounts for tags where the src tag comes first, or they used the deprecated language tag, or they omitted type altogether (it's supposed to be there, but it isn't always). Note that I'm using the lazy .*? so that it doesn't match all the way to the last </script> in the page.
I'm building a Ajax.ActionLink in C# which starts:
<%= Ajax.ActionLink("f lastname", ...more stuff
and I'd like there to be a new line character between the words "f" and "lastname". How can I accomplish this? I thought the special character was \n but that doesn't work, and <br> doesn't work either.
You might have to revert to doing something like:
f<br />last
And then wire in the Ajax bits manually.
Try this:
<%= Ajax.ActionLink("f<br />lastname", ...more stuff
You can't use <br /> because the ActionLink method (and indeed I believe all the html and ajax extension methods) encode the string. Thus, the output would be something like
f<br />lastname
What you could try instead would be a formatting:
<%= string.Format(Ajax.ActionLink("f{0}lastname", ...more stuff), "<br />") %>
Did you try the \r\n combination?
How about:
<%= Server.UrlDecode(Ajax.ActionLink(Server.UrlEncode("f<br/>lastname"), ...more stuff
This works for me -
<%= HttpUtility.HtmlDecode(Html.ActionLink("AOT <br/> Claim #", "actionName" ))%>
The \n used to work for me. But now it seems to be depricated. Alternitavely, you may use the NewLine method, for example:
string jay = "This is a" + Environment.NewLine + "multiline" + Environment.NewLine + "statement";
I think Andrew Hare's answer is correct. If you have slightly more complicated requirement, you do have the option to create your own AjaxHelper or HtmlHelper. This will involve creating custom extension methods that work on AjaxHelper and HtmlHelpers, by doing something like:
public static class CustomHtmlHelperExtensions
{
public static MvcHtmlString FormattedActionLink(this HtmlHelper html, ...)
{
var tagBuilder = new TagBuilder("a");
// TODO : Implementation here
// this syntax might not be exact but you get the jist of it!
return MvcHtmlString.Create(tagBuilder.ToString());
}
}
You can use dotPeek or your favorite .NET reflection tool to examine the standard extensions that come with ASP.NET MVC (e.g., ActionLink) etc to find how Microsoft has implemented most of those extension methods. They have some pretty good patterns for writing those. In the past, I have taken this approach to simplify outputting HTML in a readable manner, such as, for Google Maps or Bing Maps integration, for creating options like ActionImage e.g., #Html.ActionImage(...) or to integrate outputting Textile-formatting HTML by enabling syntax such as #Html.Textile("textile formatted string").
If you define this in a separate assembly (like I do), then remember to include this into your project references and then add it to the project's Web.config as well.
Obviously, this approach is overkill for your specific purposes, and for this reason, my vote is for Andrew Hare's approach for your specific case.
It's been several years since the question was asked, but I had trouble with it. I found the answer to be (in MVC):
Text in your ActionLink: ...ActionLink("TextLine1" + Environment.Newline + "TextLine2", ...
In the ActionLink, have a class that points to a css with this line:
whitespace: pre;
That's it. I've seen answers where they put the entire Actionline in < pre > < /pre > tags, but that caused more problems than it solved.