Replace all & that are not a HTML entity using C# - c#

Basically a dup of this question using php, but I need it for C#.
I need to be able to replace any & that is not currently not any HTML entity (e.g. &) before outputting to screen. I was thinking a regex, but I'm not sure if .Net has something built in that will do this.

You can use HttpUtility.HtmlEncode.
Whithing the context of a page or UserControl, you can use Server.HtmlEncode.

Better AntiXss.HtmlEncode, prevents XSS.

You could always HTML Decode the string (which would turn any HTML symbols into their display equivalents), replace any &'s, and then Encode the string again (which turns the symbols back into what they were originally). You might need to watch for side effects though.

Related

Is there a standard way to check for HTML content with Fluent Validation

I have some text entry fields on a form and I want to prevent the user from submitting any HTML content, thus reducing chances of XSS attacks or just breaking the layout.
Is there any standard way to do this check with Fluent Validation or do I need to roll my own using a Regex. I'd prefer to use a tried and tested method rather than write my own and risk missing something subtle.
I'm using it with .Net6 and ASP.Net for Web APIs. We intend to update to .Net7 in the next few months so anything that brings could be useful.
My source for all of this is this page.
First of all you would need to replace the & character with &
Then replace < with <
Finally replace > with >.
You could also surround your html with <pre> tags, so that it preserves line returns and spaces.

Change razor syntax

How can I change razor syntax in RazorEngine?
I need to use specific keyword instead of"#" symbol.
For example: $$Model.someField instead of #Model.someField. ("$$" instead of "#").
You can't. Razor is not really designed in a way to do it. Basically (Microsoft.AspNet.)Razor has some specially written parsers which handle "#" in a special manner (by switching parsers). This means the languages (C#, Html in this case) itself need to be compatible with this procedure as well!
If you want to replace "#" with something else you need to rewrite the Razor Parsers. This is possible, but at this point you already implemented the hardest part of Razor yourself...
The real question you should ask yourself (or answer here) is: Why you want to do it? It is not as trivial as one would think, I was at this point before.
As freedomn-m suggested you should use #Html.Raw("#") or ## if you need to output a "#".
matthid
- a RazorEngine contributor

why does MS anti xss library (v4) remove html 5 data attributes

AntiXss library seems to strip out html 5 data attributes, does anyone know why?
I need to retain this input:
<label class='ui-templatefield' data-field-name='P_Address3' data-field-type='special' contenteditable='false'>[P_Address3]</label>
The main reason for using the anti xss library (v4.0) is to ensure unrecognized style attributes are not parsed, is this even possible?
code:
var result = Sanitizer.GetSafeHtml(html);
EDIT:
The input below would result in the entire style attributes removed
Input:
var input = "<p style=\"width:50px;height:10px;alert('evilman')\"/> Not sure why is is null for some wierd reason!<br><p></p>";
Output:
var input = "<p style=\"\"/> Not sure why is is null for some wierd reason!<br><p></p>";
Which is fine, if anyone messes around with my code on client side, but I also need the data attribute tags to work!
I assume you mean the sanitizer, rather than the encoder. It's doing what it's supposed to - it simply doesn't understand HTML5 or recognise the attributes, so it strips them. There are ways to XSS via styles.
It's not possible to customise the safe list either I'm afraid, the code base simply doesn't allow for this - I know a large number of people want those, but it would take a complete rewrite to support it.

How to strip all tags from wikipedia pages or make page more readable

I want to strip all tags, remove the [show][Hide] stuffs from wikipedia, or is there some website that makes pages in more readable format.
Please I am aware of the Wikipedia printable version, but I don't need any tags in that, as I have some other use. So please answer the original question only, about any website or webservice or code snippets in php/C# to remove the tags from a webpages.
Also like when I copy some list from firefox it replaces <li> with the *, is it possible to set something in firefox to return some other non readable character like some kind of dot
You can start by taking a look at the strip_tags function.
You could use an HTML parser, BeautifulSoup (Python) or Simple HTML DOM for example. Or you could try using an XML parser.
I want to strip all tags, remove the
[show][Hide] stuffs from wikipedia, or
is there some website that makes pages
in more readable format.
You should take a look at DBpedia, Wikipedia, but just the data.
http://dbpedia.org/About
What about htmlagilitypack
htmlagilitypackt
Similar thread available in stackoverflow
Is there a Wikipedia API?
Try this function.
Dim pattern As String = "<(.|\n)*?>"
Return System.Text.RegularExpressions.Regex.Replace(strHtmlString, pattern, String.Empty).Trim()

PHPs htmlspecialcharacters equivalent in .NET?

PHP has a great function called htmlspecialcharacters() where you pass it a string and it replaces all of HTML's special characters with their safe equivalents, it's almost a one stop shop for sanitizing input. Very nice right?
Well is there an equivalent in any of the .NET libraries?
If not, can anyone link to any code samples or libraries that do this well?
Try this.
var encodedHtml = HttpContext.Current.Server.HtmlEncode(...);
System.Web.HttpUtility.HtmlEncode(string)
Don't know if there's an exact replacement, but there is a method HtmlUtility.HtmlEncode that replaces special characters with their HTML equivalents. A close cousin is HtmlUtility.UrlEncode for rendering URL's. You could also use validator controls like RegularExpressionValidator, RangeValidator, and System.Text.RegularExpression.Regex to make sure you're getting what you want.
Actually, you might want to try this method:
HttpUtility.HtmlAttributeEncode()
Why? Citing the HtmlAttributeEncode page at MSDN docs:
The HtmlAttributeEncode method converts only quotation marks ("), ampersands (&), and left angle brackets (<) to equivalent character entities. It is considerably faster than the HtmlEncode method.
In an addition to the given answers:
When using Razor view engine (which is the default view engine in ASP.NET), using the '#' character to display values will automatically encode the displayed value. This means that you don't have to use encoding.
On the other hand, when you don't want the text being encoded, you have to specify that explicitly (by using #Html.Raw). Which is, in my opinion, a good thing from a security point of view.

Categories