PHP has a great function called htmlspecialcharacters() where you pass it a string and it replaces all of HTML's special characters with their safe equivalents, it's almost a one stop shop for sanitizing input. Very nice right?
Well is there an equivalent in any of the .NET libraries?
If not, can anyone link to any code samples or libraries that do this well?
Try this.
var encodedHtml = HttpContext.Current.Server.HtmlEncode(...);
System.Web.HttpUtility.HtmlEncode(string)
Don't know if there's an exact replacement, but there is a method HtmlUtility.HtmlEncode that replaces special characters with their HTML equivalents. A close cousin is HtmlUtility.UrlEncode for rendering URL's. You could also use validator controls like RegularExpressionValidator, RangeValidator, and System.Text.RegularExpression.Regex to make sure you're getting what you want.
Actually, you might want to try this method:
HttpUtility.HtmlAttributeEncode()
Why? Citing the HtmlAttributeEncode page at MSDN docs:
The HtmlAttributeEncode method converts only quotation marks ("), ampersands (&), and left angle brackets (<) to equivalent character entities. It is considerably faster than the HtmlEncode method.
In an addition to the given answers:
When using Razor view engine (which is the default view engine in ASP.NET), using the '#' character to display values will automatically encode the displayed value. This means that you don't have to use encoding.
On the other hand, when you don't want the text being encoded, you have to specify that explicitly (by using #Html.Raw). Which is, in my opinion, a good thing from a security point of view.
Related
How can I change razor syntax in RazorEngine?
I need to use specific keyword instead of"#" symbol.
For example: $$Model.someField instead of #Model.someField. ("$$" instead of "#").
You can't. Razor is not really designed in a way to do it. Basically (Microsoft.AspNet.)Razor has some specially written parsers which handle "#" in a special manner (by switching parsers). This means the languages (C#, Html in this case) itself need to be compatible with this procedure as well!
If you want to replace "#" with something else you need to rewrite the Razor Parsers. This is possible, but at this point you already implemented the hardest part of Razor yourself...
The real question you should ask yourself (or answer here) is: Why you want to do it? It is not as trivial as one would think, I was at this point before.
As freedomn-m suggested you should use #Html.Raw("#") or ## if you need to output a "#".
matthid
- a RazorEngine contributor
Is it possible to use regular expressions to define syntax highlighting in Scintilla? And if so, how to do it?
I have a custom language to process, which cannot be described in simple terms of keywords and delimiters. The meaning of particular structures in this language is dependent only on their position relative to keywords. I have regular expression based parser for this format, all I need is to apply regular expression defined rules as text styles.
I mean if something matches regex1, it should have style1. Is it possible? How?
If not - can I set styles for manually selected ranges? I mean to assign style number to a specified character range in editor. How to do it?
Is it possible to define Scintilla styles in code, not in xml file?
EDIT:
OK, I've found a way.
foreach (Match m in Patterns.Keyword0.Matches(Encoding.ASCII.GetString(e.RawText)))
e.GetRange(m.Index, m.Index + m.Length).SetStyle(1);
The problem is RawText property. It's byte buffer of UTF-8 encoded text. The text property contains nice UTF-16 text, but the GetRage method accepts byte offset not character offset. If I use conversion on each TextChanged event I loose almost all speed advantage from using Scintilla.
Of course the easiest way would be to change internal encoding to UTF-16, but when I do it, I get exception saying this encoding is not supported. The only one supported seems to be UTF-8 which is ridiculously hard (and slow) to process.
I'm hitting a wall here.
The key to this is to set the lexer to SCLEX_CONTAINER and then handle the SCN_STYLENEEDED notification. This means you only ever have to process the text that actually needs styling.
There are several guides linked at the top of the Scintilla Documentation that detail various aspects of implementing customs lexers, so I won't bother repeating any of that here.
As for performance: I've written custom scintilla lexers is python that decode to utf-8 when styling and have never noticed any significant issues, so I'd be amazed if you couldn't at least match that using C#.
Basically a dup of this question using php, but I need it for C#.
I need to be able to replace any & that is not currently not any HTML entity (e.g. &) before outputting to screen. I was thinking a regex, but I'm not sure if .Net has something built in that will do this.
You can use HttpUtility.HtmlEncode.
Whithing the context of a page or UserControl, you can use Server.HtmlEncode.
Better AntiXss.HtmlEncode, prevents XSS.
You could always HTML Decode the string (which would turn any HTML symbols into their display equivalents), replace any &'s, and then Encode the string again (which turns the symbols back into what they were originally). You might need to watch for side effects though.
I need some functionality to make the following string in a url-friendly format:
"knæ som gør" should be "kna-som-gor"
That is, replacing culture specific characters to characters that can be used in urls.
Using .Net and C#
Please help me :)
/Andreas
Don't complicate things. :)
Either use a regexp, or simply use String.Replace.
You can find a solution that removes diacritics here: How do I remove diacritics (accents) from a string in .NET?. This solution does not help you with æ or ø, though.
Maybe that removes enough of your special characters that the rest can be translated using simple replacing?
If "url-friendly" does not mean pretty, you could also use HttpUtility.UrlEncode, which produces
"kn%c3%a6+som+g%c3%b8r".
Edit: Added possible solution (end of post).
I had a very similar problem, albeit for file names rather than URLs. The main problem seems to be that there is no standard way to ask for the "best ASCII replacement for ø", so even if you can locate all the unwanted characters it is hard to automate which replacement to insert.
I posted quite a bit of code that might be helpful. See this StackOverflow question for details.
Edit: I think the solution to this problem lies with StringInfo, which allows you to iterate through the sub-characters (Unicode surrogates or combining characters) in a string. This should make it possible to detect and convert something like å (which can be encoded in Unicode as either A-WITH-RING or RINGED-A; filter out the decorator and keep the part that is a normal character).
In my web app, my parameters can contain all sorts of crazy characters (russian chars, slashes, spaces etc) and can therefor not always be represented as-is in a URL.
Sending them on their merry way will work in about 50% of the cases. Some things like spaces are already encoded somewhere (I'm guessing in the Html.BuildUrlFromExpression does). Other things though (like "/" and "*") are not.
Now I don't know what to do anymore because if I encode them myself, my encoding will get partially encoded again and end up wrong. If I don't encode them, some characters will not get through.
What I did is manually .replace() the characters I had problems with.
This is off course not a good idea.
Ideas?
--Edit--
I know there are a multitude of encoding/decoding libraries at my disposal.
It just looks like the mvc framework is already trying to do it for me, but not completely.
<a href="<%=Html.BuildUrlFromExpression<SearchController>(c=>c.Search("", 1, "a \v/&irdStr*ng"))%>" title="my hat's awesome!">
will render me
<a href="/Search.mvc/en/Search/1/a%20%5Cv/&irdStr*ng" title="my hat's awesome!">
Notice how the forward slash, asterisk and ampersand are not escaped.
Why are some escaped and others not? How can I now escape this properly?
Am I doing something wrong or is it the framework?
Parameters should be escaped using Uri.EscapeDataString:
string url = string.Format("http://www.foo.bar/page?name={0}&address={1}",
Uri.EscapeDataString("adlknad /?? lkm#"),
Uri.EscapeDataString(" qeio103 8182"));
Console.WriteLine(url);
Uri uri = new Uri(url);
string[] options = uri.Query.Split('?','&');
foreach (string option in options)
{
string[] parts = option.Split('=');
if (parts.Length == 2)
{
Console.WriteLine("{0} = {1}",parts[0],
Uri.UnescapeDataString(parts[1]));
}
}
AS others have mentioned, if you encode your string first you aviod the issue.
The MVC Framework is encoding characters that it knows it needs to encode, but leaving those that are valid URL characters (e.g. & % ? * /). This is because these are valid URL characters, although they are special chracters in a URL that might not acheive the result you are after.
Try using the Microsoft Anti-Cross Site Scripting library. It contains several Encode methods, which encode all the characters (including #, and characters in other languages). As for decoding, the browser should handle the encoded Url just fine, however if you need to manually decode the Url, use Uri.UnescapeDataString
Hope that helps.
Escaping of forward slahes and dots in path part of url is prohibited by security reason (althrough, it works in mono).
Html.BuildUrlFromExpression needs to be fixed then, would submit this upstream to the MVC project... alternatively do the encoding to the string before passing to BuildUrlFromExpression, and decode it when it comes back out on the other side.
It may not be readily fixable, as IIS may be handling the decoding of the url string beforehand... may need to do some more advanced encoding/decoding for alternative path characters in the utility methods, and decode on your behalf coming out.
I've seen similar posts on this. Too me, it looks like a flaw in MVC. The function would be more appropriately named "BuildUrlFromEncodedExpression". Whats worse, is that the called function needs to decode its input parameters. Yuk.
If there is any overlap between the characters encoded BuildUrlFromExpression() and the characters encoded by the caller (who, I think might fairly just encode any non-alphanumeric for simplicities sake) then you have potential for nasty bugs.
Server.URLEncode or HttpServerUtility.UrlEncode
I see what you're saying now - I didn't realize the question was specific to MVC. Looks like a limitation of that part of the MVC framework - particularly BuildUrlFromExpression is doing some URL encoding, but it knows that also needs some of those punctation as part of the framework URLs.
And also unfortunately, URLEncoding doesn't produce an invariant, i.e.
URLEncode(x) != URLEncode(URLEncode(x))
Wouldn't that be nice. Then you could pre-encode your variables and they wouldn't be double encoded.
There's probably an ASP.NET MVC framework best practice for this. I guess another thing you could do is encode into base64 or something that is URLEncode-invariant.
Have you tried using the Server.UrlEncode() method to do the encoding, and the Server.UrlDecode() method to decode?
I have not had any issues with using it for passing items.