Fortify and AntiXSS

Fortify and AntiXSS - c#

My company requires our ASP.NET code to pass a Fortify 360 scan before releasing the code. We use AntiXSS everywhere to sanitize HTML output. We also validate input. Unfortunately, they recently changed the "template" Fortify was using and now it's flagging all our AntiXSS calls as "Poor Validation". These calls are doing things like AntiXSS.HTMLEncode(sEmailAddress).
Anyone know exactly what would satisfy Fortify? A lot of what it's flagging is output where the value comes from a database and never from a user at all, so if HTMLEncode isn't safe enough, we have no idea what is!

I'm a member of Fortify's Security Research Group and I'm sorry for the confusion this issue has been causing you. We haven't done a very good job of presenting this type of issue. I think part of the problem is the category name -- we're not trying to say that there is anything wrong with the validation mechanism, just that we cannot tell if it is the appropriate validation for this situation.
In other words, we don't know what the right validation is for your particular context. For this reason, we do not recognize any HTML encoding functions as validating against XSS out of the box, even the ones in Microsoft's AntiXSS library.
As for what the right solution is, if you are using HtmlEncode to output a username to the body of an HTML page, your original code is fine. If the the encoded username is being used in a URL, it could be vulnerable to XSS. At Fortify, when we find similar issues in our own code, if the validation matches the context, we mark it "Not an Issue".
We are aware of the problems around these issues keep tweaking our rules to make them more precise and understandable. We release new rules every three months and expect to make a couple changes in upcoming releases. For Q4, we plan to split the issues into Inadequate Validation (for blacklisting encoding and other weak validation schemes) and Context Sensitive Validation (the type of issue you're seeing). Please let us know if we can help more.
(A link to an OWASP explanation of why HTML encoding doesn't solve all problems:
http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#Why_Can.27t_I_Just_HTML_Entity_Encode_Untrusted_Data.3F)

fd_dev, I would add that you shouldn't focus on squeezing your code to fit through static analysis hoops. If you are qualified and confident that the finding doesn't apply, use the Fortify GUI tools to record a comment and suppress the issue.
If you are not sure, take a little screenshot and email it to Fortify Technical Support. They are well qualified to advise you on how to interpret your Fortify security findings.
blowdart is spot on. See http://www.schneier.com/blog/archives/2008/05/random_number_b.html for the worst case of what can happen if you chase static analysis results without understanding the purpose of the code and the reason/mechanics behind the finding. (In a word, you could make the code worse instead of better)-:

We've found a solution. Believe it or not, this causes Fortify360 to accept the code.
string sSafeVal = Regex.Replace(sValue, #"[\x00-\x1F\x7F]+", "");
Response.Write AntiXSS.HTMLEncode(sSafeVal);
So where AntiXSS.HTMLEncode alone fails, replacing non-printable characters works. Nevermind the fact that the HTMLEncode would do that anyways. I'm guessing they simply trigger off the Regex.Replace and I imagine any pattern would work.

Related

URL display with proper output using System.Uri c#

I have an application where in I have stored a lot of websites without validating them. Now I am validating the URL entered. But the already stored URL's are there as it is.
I want a strict display code that allows me to correct the user typos also and just gives the a proper URL to deal with.
The data that is already in the system has a lot of typos such as ...http://example.com or htp://example.com or ttp://example.com. I want the code to tackle that and come up with the proper url either by regexing the invalid part or making it correct.
That is the best approach to establish this?

You can obviously pick out the correct ones with a regex.
However, you will need to write your own logic to fix those that are 'broken'. You could pull these and with another regex and then simply search and replace the broken element. There are going to be limitations to this as you can only really check the protocol prefix and not the domain part itself.

Here is my try:
http(s)?://(www.)?[a-zA-Z0-9\-\.\\/]+
where [a-zA-Z0-9-.\/] includes all characters that you want to allow users to use.
P.S. please be aware that if you are using RegEx under C#, do not forget to use double \\ as otherwise your expression might not work properly.
Hope it gets you started.

What is the best practice to handle dangerous characters in asp.net?

What is the best practice to handle dangerous characters in asp.net?
see example: asp.net sign up form
Should you:
use a JavaScript to prevent them from entering it into the textbox in the 1st place?
have a general function that does a find and replace on the server side?
The problem with #1, is it will increase page load time.

ASP .NET handles potentially dangerous characters for you, by default since ASP .NET 2.0. From Request Validation in ASP.NET:
Request validation is a feature in ASP.NET that examines an HTTP
request and determines whether it contains potentially dangerous
content. In this context, potentially dangerous content is any HTML
markup or JavaScript code in the body, header, query string, or
cookies of the request. ASP.NET performs this check because markup or
code in the URL query string, cookies, or posted form values might
have been added for malicious purposes.
Request validation helps prevent this kind of attack. If ASP.NET
detects any markup or code in a request, it throws a "potentially
dangerous value was detected" error and stops page processing.
Perhaps the most important bit of this is that it happens on the server; regardless of the client accessing your application they can not just turn of JavaScript to work around it.

Solution number 1 won't increment load time by much.
You should ALWAYS use solution number 2 along with solution number one, because users can turn off javascript in their browsers.

You accept them like regular characters on the write-side. When rendering you encode your output. You have to encode it anyway regardless of security so that you can display special characters.

What is the best practice to handle dangerous characters in asp.net?
I did not watch the screencast you link to (questions should be self-contained anyway), but there are no dangerous characters. It all depends on the context. Take Stack Overflow for example, it lets me input the characters Dangerous!'); DROP TABLE Questions--. Nothing dangerous there.
ASP.NET itself will do its best to prevent malicious input at the HTTP level: it won't let any user access files like web.config or files outside your web root.
As soon as you start doing something with user input, it's up to you. There's no silver bullet, no one rule that fits them all. If you're going to display the user input as HTML, you'll have to make sure you only allow harmless markup tags without any scriptable attributes. If you're allowing users to upload images, make sure only images get uploaded. If you're going to send input to an RDBMS, be sure to escape characters that have meaning for the database manipulation language.
And so on.

ALWAYS validate input on the server, this should not even be a discussion, just do it!
Client-side validation is just eye candy for the user, but the server is where it counts!

Thinking that
ASP .NET handles potentially dangerous characters for you, by default since ASP .NET 2.0. From Request Validation in ASP.NET:
is like thinking that a solid door will keep a thief out. It won't. It will only slow him. You have to know what are the most common vectors and what are the possible solutions. You must comprehend that every EVERY EVERY variable (field/property) you write in an HTML/CSS/Javascript is a potential attack vector that must be sanitized (through the use of appropriate libraries, like some methods included in newer MVC.NET, or at least the <%: %> of ASP.NET 4.0), no exceptions, every EVERY EVERY query you execute is a potential attach vector that must be sanitized through the exclusive use of ORM and parameterized queries, no exceptions. No passwords must be saved in the db. And tons of other similar things. It isn't very difficult, but laziness, complacence, ignorance will make it harder (if not nearly impossible). If it isn't you that will introduce the hole then it's the programmer on your left, or the programmer on your right. There is not hope.

Strategy for auto encoding text inputs?

To prevent my application from crashing with the error "A potentially dangerous Request.Form value was detected...", I just turned page validation off. I want to revisit this and solve it correctly.
Is there a good strategy for this? If people are entering '<' and '>', I think the only way to save their data is to encode it via Javacript. I have tried catching it in the code-behind, but it becomes too late. I am thinking of inheriting the textbox and auto encode/decode the input with client scripts. I also have to think of all the angle brackets that are already saved in my database.
Any suggestions or experience with this?

I get from your answer that you don't want your client to send you "dangerous" content, so its desirable to leave the page validation turned on, as a last line of defense, instead of turning it off and using Server.HtmlEncode on each user input value (you might miss one and it is a lot of work).
I would go for a javascript solution, for example you could use a library such as jQuery, and hook into the submit events of the forms, and tidy the input before submitting. Much cleaner than creating your own derived textbox.
For the users without javascript, or that try to "hack" your little script, sc#!w them, they will reach your last line of defense, and get an error.

It's best to think of the built-in page validation as a safety device that isn't applicable to all cases. There are more than a few times when it is completely impossible to do something with it turned on. In these cases we turn it off, and deal with the validation ourselves.
The most obvious case is that sometimes we actually do want to send big chunks of HTML to the server. Of course, doing so still has to be made secure, but "oh, that looks like a big chunk of HTML! throw a security exception!" obviously isn't the correct way to do that.
So, in these cases it's perfectly sensible to turn off page-validation and add your own server-side. It does mean that you have to think about just how this input will be used with a bit more scrutiny than before. Follow through the path of every datum input (not just those where you expect to see characters like <, and ensure that either it will never be sent back to the client unescaped, or that it is thoroughly inspected to guarantee safety.

You can escape dangerous chars before posting the data. Like this:
string = escape(string);
and then on the server side:
var stringVal = Server.UrlDecode(Request["string"]);
Something like that.

Have you considered using ,
Server.HtmlEncode(input)
There is no real need to do it in the client end using javascript. You can easily do it in the server side using the above technique.
And possibly be a duplicate of this question
/BB

BBCode to HTML transformation rules

Background
I have written very simple BBCode parser using C# which transforms BBCode to HTML. Currently it supports only [b], [i] and [u] tags. I know that BBCode is always considered as valid regardless whatever user have typed. I cannot find strict specification how to transform BBCode to HTML
Question
Does standard "BBCode to HTML" specification exist?
How should I handle "[b][b][/b][/b]"? For now parser yields "<b>[b][/b]</b>".
How should I handle "[b][i][u]zzz[/b][/i][/u]" input? Currently my parser is smart enough to produce "<b><i><u>zzz</u></i></b>" output for such case, but I wonder that it is "too smart" approach, or it is not?
More details
I have found some ready-to-use BBCode parser implementations, but they are too heavy/complex for me and, what is worse, use tons of Regular Expressions and produce not that markup what I expect. Ideally, I want to receive XHTML at the output. For inferring "BBCode to HTML" transformation rules I am using this online parser: http://www.bbcode.org/playground.php. It produces HTML that is intuitively correct on my opinion. The only thing I dislike it does not produce XHTML. For example "[b][i]zzz[/b][/i]" is transformed to "<b><i>zzz</b></i>" (note closing tags order). FireBug of course shows this as "<b><i>zzz</i></b><i></i>". As I understand, browsers fix such wrong closing tags order cases, but I am in doubt:
Should I rely on this browsers feature and do not try to make XHTML.
Maybe "[b][i]zzz[/b]ccc[/i]" must be understood as "<b>[i]zzz</b>ccc[/i]" - looks logically for such improper formatting, but is in conflict with popular forums BBCode outputs (*zzz****ccc*, not **[i]zzzccc[/i])
Thanks.

On your first question, I don't think that relying on browsers to correct any kind of mistakes is a good idea regardless the scope of your project (well, maybe except when you're actually doing bug tests on the browser itself). Some browsers might do an awesome job on that while others might fail miserably. The best way to make sure the output syntax is correct (or at least as correct as possible) is to send it with a correct syntax to the browser in the first place.
Regarding your second question, since you're trying to have correct BBCode converted to correct HTML, if your input is [b][i]zzz[/b]ccc[/i], its correct HTML equivalent would be <i><b>zzz</b>ccc</i> and not <b>[i]zzz</b>ccc[/i]. And this is where things get complicated as you would not be writing just a converter anymore, but also a syntax checker/correcter. I have written a similar script in PHP for a rather weird game engine scripting language but the logic could be easily applied to your case. Basically, I had a flag set for each opening tag and checked if the closing tag was in the right position. Of course, this gives limited functionality but for what I needed it did the trick. If you need more advanced search patterns, I think you're stuck with regex.

If you're only going to implement B, I and U, which aren't terribly important tags, why not simply have a counter for each of those tags: +1 each time it is opened, and -1 each time it's closed.
At the end of a forum post (or whatever) if there are still-open tags, simply close them. If the user puts in invalid bbcode, it may look strange for the duration of their post, but it won't be disastrous.

Regarding invalid user-submitted markup, you have at least three options:
Strip it out
Print it literally, i.e. don't convert it to HTML
Attempt to fix it.
I don't recommend 3. It gets really tricky really fast. 1 and 2 are both reasonable options.
As for how to parse BBCode, I strongly recommend against using regex. BBCode is actually a fairly complex language. Most significantly, it supports nesting of tags. Regex can't handle arbitrary nesting. That's one of the fundamental limitations of regex. That makes it a bad choice for parsing languages like HTML and BBCode.
For my own project, rbbcode, I use a parsing expression grammer (PEG). I recommend using something similar. In general, these types of tools are called "compiler compilers," "compiler generators," or "parser generators." Using one of these is probably the sanest approach, as it allows you to specify the grammar of BBCode in a clean, readable format. You'll have fewer bugs this way than if you use regex or attempt to build your own state machine.

ReSharper Code Cleanup/Reformat Code feature vs Versioning Control Systems

ReSharper Code cleanup feature (with "reorder members" and "reformat code" enabled) is really great. You define a layout template using XML, then a simple key combination reorganizes your whole source file (or folder/project/solution) according to the rules you set in the template.
Anyway, do you think that could be a problem regarding VCS like subversion, cvs, git, etc. ? Is there a chance that it causes many undesired conflicts ?
Thank you.

Yes, it will definitely cause problems. In addition to creating conflicts that have to be manually resolved, when you check in a file that has been reformatted, the VCS will note almost every line as having been changed. This will make it hard for you or a teammate to look back at the history and see what changed when.
That said, if everyone autoformats their code the same way (ie, you distribute that XML template to the team), then it might work well. The problems really only come in when not everyone is doing the same thing.

I'm waiting for an IDE or an editor that always saves source code using some baseline formatting rules, but allows each individual developer to display and edit the code in their own preferred format. That way I can put my open curly brace at the beginning of the next line and not at the end of the current line where all you heathens seem to think it goes.
My guess is I'll be waiting for a long time.

Just reformat the whole solution
once
AND make sure that every developer
is using Resharper
AND make sure that formatting
options are shared and versioned
(code style sharing options)

You can use StyleCop to enforce a comprehensive set of standards which pretty much forces everyone to use the same layout styles. Then all you need to do is develop a ReSharper code style specification that matches this, and distribute it to the team.
I'm still waiting for someone else to do this, and for JetBrains to clear up all the niggling details which aren't fully supported, in order to allow ReSharper to basically guarantee full StyleCop compliance.

It can definitely cause conflicts, so I would make sure you don't reformat entire files if there are people working on them in parallel.

It definitely could cause conflicts.
If you want to use this in a multi-user environment then the configuration of Resharper needs to format your code to a set of standards which are enforced in your organization regardless of whether users make use of Resharper or not.
That way you are using the tool to ensure your own code meets the standards, not blanket applying your preferences to the whole codebase.

I Agree with the previous answers that state that conflicts are possible and even likely.
If you are planning to reformat code then at least make sure that you don't mix reformat checkins with those that change the function of the actual code. This way people can skip past check-ins that are simple reformattings. It's also a good idea to make sure that everyone knows a reformat is coming up so that they can object if they have ongoing work in that area.

We're working on something to work with refactors at the source code level. We call it Xmerge, and it's now part of Plastic. It's just a first approach, since we're working on more advanced solutions. Check it here.

It might be a good idea to write a script to check out every version in your source control history, apply the code cleaning, then check it into a new repository. Then use that repository for all your work in future.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.