Why isn't font rendered in HTML when surronded by " - c#

I have some HTML stored in a database and I am generating an static HTML file out of it. However when i open the file in the browser, the font doesn't render as I expect it.
I have tracked down the problem and I see it is because of & quot;
<p><span style="font-family: "Roboto Regular";">Some text</span></p>
Now if I replace the & quot; with double quotes, it works fine.
This is also generated through C#. What is the best approach to fix this?
Should I just use Replace function to convert them to quotes or is there a library that I can use to do it more efficiently? or is it even simpler to fix.
Thanks for your thoughts.

You can use System.Web.HttpUtility.HtmlDecode (and Encode) to handle this sort of thing.
However you should be asking yourself why your font string includes HTML encoded characters.

HTML is spewed out as is and not parsed until it reaches the browser. This is a security measure to ensure that no malicious code can be run in the browser. I will recommend you use the Replace function you suggest. If you want to take security to the next level, I will suggest you encode the opening and closing braces of HTML tags and including that inside your Replace method.

I will explain why...
<p><span style="font-family: "Roboto Regular";">Some text</span></p>
Since you said c#, your code clearly shows that HtmlDecode doesn't decode quotes for font-family. This is caused by copy-paste html to database while using HtmlEncode as below.
HtmlEncode("<p><span style="font-family: "Roboto Regular";">Some text</span></p>");
As you notice above usage of quote in another quote is illegal in html unless it is escaped. That is why HtmlDecode thinks this is escaped quote and leaves it as it is.
SOLUTION: You can replace the font-family quotes with single quote before HtmlEncode but this would create SQL issue to address which is replacing single quote with double single quotes. Off course you need to reverse it after HtmlDecode. HtmlEncoding again or replacing with double quotes would not fix the issue since you are creating another illegal quotes in quotes. That is why you need to simply replace [&quote;] with single quote ['] and you can do this in frontend.
("<p><span style="font-family: "Roboto Regular";">Some text</span></p>").replace(""", "'");

Related

escape sequence while assigning style class to div in a label

I came across a text assignment to a label in c# code. The code is
lbltext.text = string.format("<div class=\"test\">{0}</div>", "succesfully updated").
I'm using Label control and documentations says:
The Text property can include HTML. If it does, the HTML will be passed unchanged to the browser, where is might be interpreted as markup and not as text.
Even after reading it I don't understand what it is the purpose of using escape after "class=".
Is it like assigning the css class "test" to that div element?
If yes then why can't we do it like 'class="test"'?
Unable to find any answers googling. Can someone please clarify?.
So I'm not 100% sure what this question is asking, but if it really is as simple as "why not just use class="test", then that would be because double quotes are used to describe literal strings. So What you would end up with would be
"<div class=" test ">{0}</div>"
Which would cause a build error for starts. The escaping is done to allow double quotes to be used in a string without terminating the string early.
Worth noting that for classes, you can also you single quotes to get around this.
"<div class='test'>{0}</div>"
For Example :)

how to replace single quote with string in javascript

I have some data from a lookup like this: =winz\ach'dull.
How can I replace single quotes (') with ("").
This is my code =>
<input type="button" id="btnSelect" onclick="Select('<%#Eval("LoginName").ToString().Replace("'", "\'")%>');" value="Select"/>
I'm trying to create code like this:
Select('<%#Eval("LoginName").ToString().Replace("'", "\'")%>');
but it does not not work.
Please correct and help me. Thanks.
In pure javascript we could do :
var a="winz\ach'dull.";
alert(a.replace("'",'"'));
And that would replace your single quote.
Note: Your code is C# not javascript.
You can escape quotes with the "\" character and it works perfectly with HTML. So the answer to exactly what you wrote would be: (this is just to humour you in the future)
"Select('<%#Eval(\"LoginName\").ToString().Replace(\"'\", \"\'\")%>');"
But you have syntax errors in what you are writing and that Eval stuff is not javascript so I don't know why ToString and Replace are attached to it. I've changed it a little based on guessing what you're trying to do:
<input onclick="Select('<%#Eval("LoginName")%>').ToString().Replace(\"'\", \"'\");">
Note that if you're using C# or something on the server side it doesn't need to be escaped because by the time the HTML is parsed in the DOM, typically a browser the source no longer contains your server side code and only the output!

Html encode problem for server side strings

I am trying to do html encode on the below string which has quotes , buts it not working
The server returns with quotes for the string
string serverString= **“Test hello,”** // this is returned from database
serverString =HttpUtility.HtmlEncode(serverString);
i am getting this result
�Test helloI,�
but still its not replacing and i am getting some diamond symbols on the asp.net page
Can anybody tell me what am i doing wrong.
The quote characters you're seeing are perfectly legitimate characters from an HTML standpoint, so they don't need to be encoded by HtmlEncode. What you're most likely seeing is an issue with your browser's encoding not supporting those characters. See http://www.htmlbasictutor.ca/character-encoding.htm for more information.
Are you sure it's not a rendering issue? You might try a font like "Arial Unicode MS" to make sure the browser is rendering the characters properly.
You should also verify the string returned from the database is correct.
Lastly, it could help to share how you're writing serverString to your response stream. Some ASP.NET controls expect text and HTML-encode for you while others expect HTML and do not.
This is because the server is returning fancy double quotes (that's not the technical name for them) instead of regular double quotes. You could do something like this:
string serverString= "“Test hello,”";
serverString = HttpUtility.HtmlEncode(serverString)
// Replaces fancy left double quote with regular one
.Replace("\u2018", "'")
// Replaces fancy right double quote with regular one
.Replace("\u2019", "'");

When's an Apostrophe not an Apostrophe - validation .Net / Javascript

I have an regular expression validator for emails in .NET 2.0 which uses client side validation (javascript).
The current expression is "\w+([-+.']\w+)#\w+([-.]\w+).\w+([-.]\w+)" which works for my needs (or so I thought).
However I was getting a problem with apostrophes as I had copy/pasted an email address from Outlook into the forms text field
Chris.O’Brian#somerandomdomain.com
You can see the apostrophe is a different character from what get if I were just to type into a text box
' vs ’ - but both are apostrophes
Okay I thought , lets just add in this character into the validation string so I get
"\w+([-+.'’]\w+)#\w+([-.]\w+).\w+([-.]\w+)"
I copy paste the "special" apostrophe into the validation expression, then I type the email and use the same clipboard item to paste the apostrophe but the validation still fails.
The apostrophe doesn't look the same in the .net code behind file as the .net form and because the validation is still failing , I am presuming it's being considered a different character because of some sort of encoding of the .cs source file?
Does this sound plausible, has someone else encountered the same problem?
Thanks
You should add a '+' after ([-+.'`]\w+), to allow for multiple groups of 'words'. The expression you gave only allows for two words, and you have three: Chris, O, Brian.
Hope this makes things clearer.
There will be a tendency in something like Outlook to use 'Smart Quotes'
Here's some background information
If you just pasted the ’ (U+2019 RIGHT SINGLE QUOTATION MARK) into your document and it didn't work it means that your document does not use unicode.
When you encode and send the file as UTF-8 (for example) it works just fine without further modifications. Otherwise you have to escape it via \u2019 which also works in JavaScript's regular expressions:
"\w+([-+.'\u2019]\w+)#\w+([-.]\w+).\w+([-.]\w+)"
In XML you could test the value of an apostrophe character by evaluating it against its character entity reference:
&apos;
That entity does not exist in the SGML form of HTML, however. And as an added bonus JavaScript cannot compare a single quote to a double quote. When compared they evaluated to true. The only solution there is to convert single quote and double quote characters to a character entity reference of your invention, perform the comparison, and then replace those invented entity references with the proper quote characters.

Convert > to HTML entity equivalent within HTML string

I'm trying to convert all instances of the > character to its HTML entity equivalent, >, within a string of HTML that contains HTML tags. The furthest I've been able to get with a solution for this is using a regex.
Here's what I have so far:
public static readonly Regex HtmlAngleBracketNotPartOfTag = new Regex("(?:<[^>]*(?:>|$))(>)", RegexOptions.Compiled | RegexOptions.Singleline);
The main issue I'm having is isolating the single > characters that are not part of an HTML tag. I don't want to convert any existing tags, because I need to preserve the HTML for rendering. If I don't convert the > characters, I get malformed HTML, which causes rendering issues in the browser.
This is an example of a test string to parse:
"Ok, now I've got the correct setting.<br/><br/>On 12/22/2008 3:45 PM, jproot#somedomain.com wrote:<br/><div class"quotedReply">> Ok, got it, hope the angle bracket quotes are there.<br/>><br/>> On 12/22/2008 3:45 PM, > sbartfast#somedomain.com wrote:<br/>>> Please someone, reply to this.<br/>>><br/>><br/></div>"
In the above string, none of the > characters that are part of HTML tags should be converted to >. So, this:
<div class"quotedReply">>
should become this:
<div class"quotedReply">>
Another issue is that the expression above uses a non-capturing group, which is fine except for the fact that the match is in group 1. I'm not quite sure how to do a replace only on group 1 and preserve the rest of the match. It appears that a MatchEvaluator doesn't really do the trick, or perhaps I just can't envision it right now.
I suspect my regex could do with some lovin'.
Anyone have any bright ideas?
Why do you want to do this? What harm are the > doing? Most parsers I've come across are quite happy with a > on its own without it needing to be escaped to an entity.
Additionally, it would be more appropriate to properly encode the content strings with HtmlUtilty.HtmlEncode before concatenating them with strings containing HTML markup, hence if this is under your control you should consider dealing with it there.
The trick is to capture everything that isn't the target, then plug it back in along with the changed text, like this:
Regex.Replace(str, #"\G((?>[^<>]+|<[^>]*>)*)>", "$1>");
But Anthony's right: right angle brackets in text nodes shouldn't cause any problems. And matching HTML with regexes is tricky; for example, comments and CDATA can contain practically anything, so a robust regex would have to match them specifically.
Maybe read your HTML into an XML parser which should take care of the conversions for you.
Are you talking about the > chars inside of an HTML tag, (Like in Java's innerText), or in the arguements list of an HTML tag?
If you want to just sanitize the text between the opening and closing tag, that should be rather simple. Just locate any > char, and replace it with the &gt ;. (I'd also do it with the &lt tag), but the HTML render engine SHOULD take care of this for you...
Give an example of what you are trying to sanitize, and maybe we an find the best solution for it.
Larry
Could you read the string into an XML document and look at the values and replace the > with > in the values. This would require recursively going into each node in the document but that shouldn't be too hard to do.
Steve_C, you may try this RegEx. This will give capture any HTML tags in reference 1, and the text between the tags is stored in capture 2. I didn't fully test this, just throwing it out there in case it might help.
<([A-Z][A-Z0-9]*)[^>]*>(.*?)</\1>

Categories