Is there a nicer syntax when creating elements with hyphenated attributes instead of using:
<%= Html.TextBox ("name", value, new Dictionary<string, object> { {"data-foo", "bar"} }) %>
Looking at the HTML specs for the proposed standards HTML 5 and WIA ARIA it seems hyphens in HTML attributes are being planned to be more common as some sort of simple name spacing.
E.g. HTML 5 proposes custom attributes are prefixed with data- and WIA ARIA uses the aria- prefix for all WIA ARIA attributes.
When using HTML helpers in ASP.NET MVC such as <%= Html.TextBox("name", value, new { attribute = attributeValue }) %> the anonymous object is converted to a dictionary.
Unfortunately in C# there is no support for hyphens in names, so the only alternative is to create a dictionary. The syntax for which is very verbose, has anyone seen a nicer alternative or a simple way of altering the functionality of ASP.NET MVC's HTML extensions without having to re-write the entire extension?
Use an underscore in the data attribute name, and it'll magically handle it for you, converting it to a hyphen. It knows you want a hyphen rather than an underscore as underscores aren't valid in html attribute names.
<%= Html.TextBox("name", value, new { #data_foo = "bar"}) %>
The answer provided at ActionLink htmlAttributes suggests using underscores instead of hyphens. MVC.Net is supposed to emit hyphens instead of the underscores when sending the page to the browser.
Related
I am using the shorthand for HttpUtility.HtmlEncode to encode the data going into my textboxs.
<asp:TextBox ID="txtProperty" runat="server" Text='<%#: Bind("Property")%>'></asp:TextBox>
My understanding of how encoded characters behave is that when your web browser renders them, they should display as the characters they represent and not the actual encoded characters. As this example code on the MSDN website suggests.
However my encoded characters does not behave this way.
For example a '£' character being retrieved from a database, displays in the textbox as:
And not:
I don't think it has anything to do with how my website is configured to handle encoding, because if I manually set the text as the encoded characters in the HTML:
<asp:TextBox ID="txtProperty" runat="server" Text="£"></asp:TextBox>
It renders the encoded characters correctly as:
This indicates to me that it is a problem with the way I am using HtmlEncode.
Still I tried explicitly setting the encoding to UTF-8 in my webconfig and it made no difference.
Could someone explain this behavior, or what might be the problem here?
When you do <%#: Bind("Property")%> ASP.NET will already take care of HTML-encoding the string, if you pre-encode it you'll fall in the double-encoding scenario.
See ScottGu's New <%: %> Syntax for HTML Encoding Output in ASP.NET 4 (and ASP.NET MVC 2):
ASP.NET 4 introduces a new IHtmlString interface (along with a concrete implementation: HtmlString) that you can implement on types to indicate that its value is already properly encoded (or otherwise examined) for displaying as HTML, and that therefore the value should not be HTML-encoded again.
The <%: %> code-nugget syntax checks for the presence of the IHtmlString interface and will not HTML encode the output of the code expression if its value implements this interface.
This allows developers to avoid having to decide on a per-case basis whether to use <%= %> or <%: %> code-nuggets.
Instead you can always use <%: %> code nuggets, and then have any properties or data-types that are already HTML encoded implement the IHtmlString interface.
I'm working with ASP.NET MVC 5 application. I have to output the HTML which a user has entered with some formatting. I store the HTML in the database as it is, i.e. without encoding, as it's adviced here, but before showing it I encode it using MS AntiXSS library. However, I have to output some tags as HTML, e.g. make text bold or italic. What's the best approach to do that, while keeping the application safe from XSS? The idea I have is to first encode the text using AntiXssEncoder and then replace the allowed tags with the usual characters using RegExp. I know that some tools exist for it, like HTML Purifier, but I haven't found anything for ASP.
Update:
I've decided to use something like
private static readonly Dictionary<string, string> allowedTags = new Dictionary<string, string>()
{
{"<p>", "<p>"},
{"</p>", "</p>"},
{"<strong>", "<strong>"},
{"</strong>", "</strong>"},
{"<em>", "<em>"},
{"</em>", "</em>"},
{" ", " "},
{"
", "<br>"}
};
and then
StringBuilder text = new StringBuilder(AntiXssEncoder.HtmlEncode(item.Text, true));
foreach (var tag in allowedTags)
{
text.Replace(tag.Key, tag.Value);
}
Though I strongly dislike this solution since it lacks flexibility and I would have to manually insert each tag into the dictionary. Also, it doesn't support attributes, e.g. < p align="center" > would have to be a separate value. I guess I can replace the first part of the tag like
text.Replace("<p", "<p");
However, if some tag is called, for example, padding (I don't know all HTMl tags which exist or might appear), then it would work with it too, since its beginning will be replaced, thus turning it into a valid tag (which might be not closed, though).
I have purposfully (for testing) assigned the following variable in WebMatrix C#:
string val = "<script type='text/javascript'>alert('XSS Vector')</script>";
Later in the page I have used razor to write that value directly to the page.
<p>
#val
</p>
It writes the text, but in a safe manner (i.e., no alert scripts run)
This, coupled with the fact that if 'val' contains an html entity (e.g., <) it also writes exactly "<" and not "<" as I would expect the page to render.
Is this because C# runs first, then html is rendered?
More importantly, is using razor in this fashion a suitable replacement for html encoding, when used like this?
The #Variable syntax will HtmlEncode any text you pass to it; hence you seeing literally what you set to the string value. You are correct in that this is for XSS protection. It is part of Razor that does this; the #Variable syntax itself.
So basically, using the #Variable syntax is not so much a 'replacement' for Html Encoding; it is HTML encoding.
Related: If you ever want some string to render the HTML, you would use this syntax in Razor:
#Html.Raw(Variable)
That causes the Html Encoding not to be done. Obviously, this is dangerous to do with user-supplied input.
I am using the following regex to get the src value of the first img tag in an HTML document.
string match = "src=(?:\"|\')?(?<imgSrc>[^>]*[^/].(?:jpg|png))(?:\"|\')?"
Now it captures total src attribute that I dont need. I just need the url inside the src attribute. How to do it?
Parse your HTML with something else. HTML is not regular and thus regular expressions aren't at all suited to parsing it.
Use an HTML parser, or an XML parser if the HTML is strict. It's a lot easier to get the src attribute's value using XPath:
//img/#src
XML parsing is built into the System.Xml namespace. It's incredibly powerful. HTML parsing is a bit more difficult if the HTML isn't strict, but there are lots of libraries around that will do it for you.
see When not to use Regex in C# (or Java, C++ etc) and Looking for C# HTML parser
PS, how can I put a link to a StackOverflow question in a comment?
Your regex should (in english) match on any character after a quote, that is not a quote inside an tag on the src attribute.
In perl regex, it would be like this:
/src=[\"\']([^\"\']+)/
The URL will be in $1 after running this.
Of course, this assumes that the urls in your src attributes are quoted. You can modify the values in the [] brackets accordingly if they are not.
How do i escape text for html use in C#? I want to do
sample="<span>blah<span>"
and have
<span>blah<span>
show up as plain text instead of blah only with the tags part of the html :(.
Using C# not ASP
using System.Web;
var encoded = HttpUtility.HtmlEncode(unencoded);
Also, you can use this if you don't want to use the System.Web assembly:
var encoded = System.Security.SecurityElement.Escape(unencoded)
Per this article, the difference between System.Security.SecurityElement.Escape() and System.Web.HttpUtility.HtmlEncode() is that the former also encodes apostrophe (') characters.
If you're using .NET 4 or above and you don't want to reference System.Web, you can use WebUtility.HtmlEncode from System
var encoded = WebUtility.HtmlEncode(unencoded);
This has the same effect as HttpUtility.HtmlEncode and should be preferred over System.Security.SecurityElement.Escape.
In ASP.NET 4.0 there's new syntax to do this. Instead of
<%= HttpUtility.HtmlEncode(unencoded) %>
you can simply do
<%: unencoded %>
Read more here:
New <%: %> Syntax for HTML Encoding Output in ASP.NET 4 (and ASP.NET MVC 2)
.NET 4.0 and above:
using System.Web.Security.AntiXss;
//...
var encoded = AntiXssEncoder.HtmlEncode("input", useNamedEntities: true);
You can use actual html tags <xmp> and </xmp> to output the string as is to show all of the tags in between the xmp tags.
Or you can also use on the server Server.UrlEncode or HttpUtility.HtmlEncode.
For a simple way to do this in Razor pages, use the following:
In .cshtml:
#Html.Raw(Html.Encode("<span>blah<span>"))
In .cshtml.cs:
string rawHtml = Html.Raw(Html.Encode("<span>blah<span>"));
You can use:
System.Web.HttpUtility.JavaScriptStringEncode("Hello, this is Satan's Site")
It was the only thing that worked (ASP.NET 4.0+) when dealing with HTML like this. The' gets rendered as ' (using htmldecode) in the HTML content, causing it to fail:
It's Allstars
There are some special quotes characters which are not removed by HtmlEncode and will not be displayed in Edge or Internet Explorer correctly, like ” and “. You can extend replacing these characters with something like the below function.
private string RemoveJunkChars(string input)
{
return HttpUtility.HtmlEncode(input.Replace("”", "\"").Replace("“", "\""));
}