Standard token for encoding parentheses for HTML Ids? - c#

I have to encode strings to remove parentheses for Ids for HTML elements.
Parentheses (these ones (,)) aren't valid in HTML Ids, are there standard strings (like those used in URLs) to use?
Is there an existing method that can be used in ASP.NET MVC?
N.B. System.Web.Mvc.HttpUtility.HtmlEncode(string), does not encode parentheses.

As per the HTML specification (and this question about id's) parentheses aren't allowed in the HTML id attribute. If you need them, you could use string replace, e.g.:
// ( = 'op--' Opening Parenthesis
// ) = 'cp--' Closing Parenthesis
string id = "collectionName.get_Item(index)";
// encode
string encodedId = id.Replace("(", "op--").Replace(")", "cp--");
// decode
string decodedId = encodedId.Replace("op--", "(").Replace("cp--", ")");

I don't think I understand the question, cos it feels like the answer is to substitute [ and ]. Or even %28 and %29 from the Wikipedia link you gave.
Have I got hold of the wrong end of the stick?
EDIT: From what has been said in the comments, it seems that %28 and %29 are not okay as the % character is also invalid, in which case you could select a substitute that won't appear elsewhere.
EG Something like ( becomes ---28--- (or even ---openbracket---) or something else you can guarantee won't appear elsewhere in the ID (which should be possible).

If the elements are dynamically created then why not just do a .Replace on the id changing parentheses for, say, underscores?
If the elements are not dynamically created then why do they have parentheses in the ids?!

Related

Regex in C# - remove quotes and escaped quotes from a value after another value

I am using HighCharts and am generating script from C# and there's an unfortunate thing where they use inline functions for formatters and events. Unfortunately, I can't output JSON like that from any serializer I know of. In other words, they want something like this:
"labels":{"formatter": function() { return Highcharts.numberFormat(this.value, 0); }}
And with my serializers available to me, I can only get here:
"labels":{"formatter":"function() { return Highcharts.numberFormat(this.value, 0); }"}
These are used for click events as well as formatters, and I absolutely need them.
So I'm thinking regex, but it's been years and years and also I was never a regex wizard.
What kind of Regex replace can I use on the final serialized string to replace any quoted value that starts with function() with the unquoted version of itself? Also, the function itself may have " in it, in which case the quoted string might have \" in it, which would need to also be replaced back down to ".
I'm assuming I can use a variant of the first answer here:
Finding quoted strings with escaped quotes in C# using a regular expression
but I can't seem to make it happen. Please help me for the love of god.
I've put more sweat into this, and I've come up with
serialized = Regex.Replace(serialized, #"""function\(\)[^""\\]*(?:\\.[^""\\]*)*""", "function()$1");
However, my end result is always:
formatter:function()$1
This tells me I'm matching the proper stuff, but my capture isn't working right. Now I feel like I'm probably being an idiot with some C# specific regex situation.
Update: Yes, I was being an idiot. I didn't have a capture around what I really wanted.
`enter code here` serialized = Regex.Replace(serialized, #"""function\(\)([^""\\]*(?:\\.[^""\\]*)*)""", "function()$1");
that gets my match, but in a case like this:
"formatter":"function() { alert(\"hi!\"); return Highcharts.numberFormat(this.value, 0); }"
it returns:
"formatter":function() { alert(\"hi!\"); return Highcharts.numberFormat(this.value, 0); }
and I need to get those nasty backslashes out of there. Now I think I'm truly stuck.
Regexp for match
"function\(\) (?<code>.*)"
Replace expression
function() ${code}
Try this : http://regexr.com?30jpf
What it does :
Finds double quotes JUST before a function declaration and immediately after it.
Regex :
(")(?=function()).+(?<=\})(")
Replace groups 1 & 3 with nothing :
3 capturing groups:
group 1: (")
group 2: ()
group 3: (")
string serialized = JsonSerializer.Serialize(chartDefinition);
serialized = Regex.Replace(serialized, #"""function\(\)([^""\\]*(?:\\.[^""\\]*)*)""", "function()$1").Replace("\\\"", "\"");

Format/Safe string for "title" attribute in anchor

I have a function that builds an anchor tag. The function recieves the URL, Title as parameters. The problem is that sometime the text includes quotation marks and this results in a anchor tag generated with syntax errors.
Which is the best way to solve this problems? Is there any function that parses the text into a safe string, in this case, for the title attribute.
Otherwise I can check the string and strip all quotation marks, but I would like know if there is a better way to do this, e.g there might be some other characters that can crash my function as well.
Actually you want to use HttpUtility.HtmlAttributeEncode to encode your title attribute. The other encoders will do more work (and have different uses) whereas this one only escapes ", &, and < to generate a valid text for an attribute.
Example:
This is a <"test"> & something else. becomes This is a <"Test"> & something else.

Dash (-) in anonymous class member

is it possible to use dash (-) in a member name of an anonymous class? I'm mainly interested in this to use with asp.net mvc to pass custom attributes to html-helpers, since I want my html to pass html5-validation, this starting with data-.
Exemple that doesn't work:
<%=Html.TextBoxFor(x => x.Something, new {data-animal = "pony"})%>
Putting a # in front of the member name doesn't do the trick either.
Update: If this isn't possible, is there a recommended way todo what I want? My current temporary solution is to add a replace to the whole thing like this:
<%=Html.TextBoxFor(x => x.Something, new {data___animal = "pony"}).Replace("___", "-")%>
But that sucks, because it's ugly and will break when Model.Something contains three underscores. Buhu.
Just found this post while searchching for the same problem.
I found this link:
http://blogs.planetcloud.co.uk/mygreatdiscovery/post/Using-custom-data-attributes-in-ASPNET-MVC.aspx
It resolves the problem. It mentions the following:
[...] or better yet, just use code from ASP.NET MVC source:
public static RouteValueDictionary AnonymousObjectToHtmlAttributes(object htmlAttributes)
{
RouteValueDictionary result = new RouteValueDictionary();
if (htmlAttributes != null)
{
foreach (System.ComponentModel.PropertyDescriptor property in System.ComponentModel.TypeDescriptor.GetProperties(htmlAttributes))
{
result.Add(property.Name.Replace('_', '-'), property.GetValue(htmlAttributes));
}
}
return result;
}
Collecting the Asp-Mvc version specific ways to do data- here:
MVC 3+ : Use an underscore _ and it will be automatically replaced by mvc
MVC 1?,2: see #Jean-Francois answer, which points to this
No, because the dash is a C# operator (minus), and white space isn't significant.
Right now the compiler thinks you are trying to subtract animal from data, which doesn't work unless the - operator is specified for the types in question.
It is not possible to use - as part of any identifier.
http://msdn.microsoft.com/en-us/library/aa664670(VS.71).aspx
No, you can't use the hyphen character. You need to use alphanumeric characters, underscores, or the other characters described here. '-' is treated as a minus. data-animal would be treated the same as data - animal, so that won't even compile unless you have separately defined data and animal (and it could present subtle bugs if you have!).
Edit: With C#'s capability to have identifiers with Unicode escape sequences, you can get the effect of a dash in an identifier name. The escape sequence for "&mdash" (the longer dash character) is "U+2014". So you would express data-animal as data\u2014animal. But from a coding style point of view, I'm not sure why you wouldn't choose a more convenient naming convention.
Also, another point to highlight from "2.4.2 Identifiers (C#)": You can't have two of these escape sequences back to back in an identifier (e.g. data\u2014\u2014animal).

C# MVC: Trailing equal sign in URL doesn't hit route

I have an MVC route like this www.example.com/Find?Key= with the Key being a Base64 string. The problem is that the Base64 string sometimes has a trailing equal sign (=) such as:
huhsdfjbsdf2394=
When that happens, for some reason my route doesn't get hit anymore.
What should I do to resolve this?
My route:
routes.MapRoute(
"FindByKeyRoute",
"Find",
new { controller = "Search", action = "FindByKey" }
);
If I have http://www.example.com/Find?Key=bla then it works.
If I have http://www.example.com/Find?Key=bla= then it doesn't work anymore.
Important Addition:
I'm writing against an IIS7 instance that doesn't allow % or similar encoding. That's why I didn't use UrlEncode to begin with.
EDIT: Original suggestion which apparently doesn't work
I'm sure the reason is that it thinks it's a query parameter called Key. Could you make it a parameter, with that part being the value, e.g.
www.example.com/Find?Route=Key=
I expect that would work (as the parser would be looking for an & to start the next parameter) but it's possible it'll confuse things still.
Suggestion which I believe will work
Alternatively, replace "=" in the base64 encoded value with something else on the way out, and re-replace it on the way back in, if you see what I mean. Basically use a different base64 decodabet.
Alternative suggestion which should work
Before adding base64 to the URL:
private static readonly char[] Base64Padding = new char[] { '=' };
...
base64 = base64.TrimEnd(Base64Padding);
Then before calling Convert.FromBase64String() (which is what I assume you're doing) on the inbound request:
// Round up to a multiple of 4 characters.
int paddingLength = (4 - (base64.Length % 4)) % 4;
base64 = base64.PadRight(base64.Length + paddingLength, '=');
IF you're passing data in the URL you should probably URL Encode it which would take care of the trailing =.
http://www.albionresearch.com/misc/urlencode.php
UrlEncode the encrypted (it is encrypted, right?) parameter.
If it is an encrypted string, beware that spaces and the + character will also get in your way.
Ok, so IIS 7 won't allow some special characters as part of your path. However, it would allow them if they were part of the querystring.
It is apparently, possible, to change this with a reg hack, but I wouldn't recommend that.
What I would suggest, then, is to use an alternate token, as suggested by Mr Skeet, or simply do not use it in your path, use it as querystring, where you CAN url encode it.
If it is an encrypted string, you haven't verified that it is or is not, you may in some cases get other 'illegal' characters. Querystring really would be the way to go.
Except your sample shows it as querystring... So what gives? Where did you find an IIS that won't allow standard uri encoding as part of the querystring??
Ok then. Thanks for the update.
RequestFiltering?
I see. Still that mentions double-encoded values that it blocks. Someone created a URL Sequence to deny any request with the '%' characters? At that point you might want to not use the encrypted string at all, but generate a GUID or something else that is guaranteed to not contain special characters, yet is not trivial to guess.

Convert > to HTML entity equivalent within HTML string

I'm trying to convert all instances of the > character to its HTML entity equivalent, >, within a string of HTML that contains HTML tags. The furthest I've been able to get with a solution for this is using a regex.
Here's what I have so far:
public static readonly Regex HtmlAngleBracketNotPartOfTag = new Regex("(?:<[^>]*(?:>|$))(>)", RegexOptions.Compiled | RegexOptions.Singleline);
The main issue I'm having is isolating the single > characters that are not part of an HTML tag. I don't want to convert any existing tags, because I need to preserve the HTML for rendering. If I don't convert the > characters, I get malformed HTML, which causes rendering issues in the browser.
This is an example of a test string to parse:
"Ok, now I've got the correct setting.<br/><br/>On 12/22/2008 3:45 PM, jproot#somedomain.com wrote:<br/><div class"quotedReply">> Ok, got it, hope the angle bracket quotes are there.<br/>><br/>> On 12/22/2008 3:45 PM, > sbartfast#somedomain.com wrote:<br/>>> Please someone, reply to this.<br/>>><br/>><br/></div>"
In the above string, none of the > characters that are part of HTML tags should be converted to >. So, this:
<div class"quotedReply">>
should become this:
<div class"quotedReply">>
Another issue is that the expression above uses a non-capturing group, which is fine except for the fact that the match is in group 1. I'm not quite sure how to do a replace only on group 1 and preserve the rest of the match. It appears that a MatchEvaluator doesn't really do the trick, or perhaps I just can't envision it right now.
I suspect my regex could do with some lovin'.
Anyone have any bright ideas?
Why do you want to do this? What harm are the > doing? Most parsers I've come across are quite happy with a > on its own without it needing to be escaped to an entity.
Additionally, it would be more appropriate to properly encode the content strings with HtmlUtilty.HtmlEncode before concatenating them with strings containing HTML markup, hence if this is under your control you should consider dealing with it there.
The trick is to capture everything that isn't the target, then plug it back in along with the changed text, like this:
Regex.Replace(str, #"\G((?>[^<>]+|<[^>]*>)*)>", "$1>");
But Anthony's right: right angle brackets in text nodes shouldn't cause any problems. And matching HTML with regexes is tricky; for example, comments and CDATA can contain practically anything, so a robust regex would have to match them specifically.
Maybe read your HTML into an XML parser which should take care of the conversions for you.
Are you talking about the > chars inside of an HTML tag, (Like in Java's innerText), or in the arguements list of an HTML tag?
If you want to just sanitize the text between the opening and closing tag, that should be rather simple. Just locate any > char, and replace it with the &gt ;. (I'd also do it with the &lt tag), but the HTML render engine SHOULD take care of this for you...
Give an example of what you are trying to sanitize, and maybe we an find the best solution for it.
Larry
Could you read the string into an XML document and look at the values and replace the > with > in the values. This would require recursively going into each node in the document but that shouldn't be too hard to do.
Steve_C, you may try this RegEx. This will give capture any HTML tags in reference 1, and the text between the tags is stored in capture 2. I didn't fully test this, just throwing it out there in case it might help.
<([A-Z][A-Z0-9]*)[^>]*>(.*?)</\1>

Categories