How do I escape POST body correctly? - c#

I need to send some JSON values to the server using POST requests; sending them without any escaping works fine, but this is not a proper solution as these values might contain special symbols like ? and &.
I've tried answers from another questions (Uri.EscapeDataString, Uri.EscapeUriString, System.Net.WebUtility.UrlEncode, System.Web.HttpUtility.UrlEncode) but they all make the server return a "Bad request" error.
How can I escape POST values properly?

After our little chat in comments, I think I know enough about what you're asking to put together an answer. So here goes.
I see three marginally-distinct questions being asked.
How can one escape an HTTP POST request's body?
I'll quote from my comment on this, since it's pretty complete.
An HTTP request body (POST or otherwise) is just bytes, the length of which is determined by the Content-Length header (or something to that effect, if chunked encoding is in use). You don't have to worry about anything being escaped for that to work, escaping only comes into play at the next level up, the content type.
Essentially, there's no native requirement for any sort of escaping for a vanilla POST to go through.
How do application/json and application/x-www-formurl-encoded play together?
In most cases, they just don't. They can, and I think I've seen it done before, but I can't really think of why you'd want to.
Just for background, and to make sure we're on the same page, here's the same object, serialized to JSON and form data respectively.
Plain object (for comparison):
PropertyName: The value is "1 & 2"
OtherProperty: 2
JSON:
{
"PropertyName" : "The value is \"1 & 2\"",
"OtherProperty" : 2
}
Form data:
PropertyName=The%20value%20is%20%221%20%26%202%22&OtherProperty=2
That value got pretty messy, as you can see. But you get the point.
So, yes, you could quite reasonably nest one in the other, but there aren't many use-cases where that would make any sense.
How do I escape JSON?
If you're using a library, that library should (if it's worth anything) do this for you. JSON.net will, for example. I believe the only thing you'd have to worry about escaping are double-quotes in JSON, but I'm no expert, and it wouldn't surprise me to learn there were edge-cases.
I'll quote again from my comment, since it seems to have helped you.
In most modern APIs ("most" and "modern" are up for interpretation there), JSON is a pretty reasonable expectation, yeah. So then we go back to my first comment, and say that you'd only really have to worry about escaping within the JSON elements themselves, and if you're using a library, that should be handled for you. You decidedly don't want to double-encode, for obvious reasons.
The quick way of testing your entire setup, of course, is just to try and submit regular data (the control) and make sure it works, then submit something with a double-quote, and if you want, a question mark.
If you run that test and the second fails, there's a good chance something's not being property escaped. That might be a nice time to either use debugging tools, or even just an application like Fiddler to inspect the request payload for anything unreasonable, for example, if your JSON looked like { "name":"val"ue" }

Related

No luck with HTML decoding based on safe HTML tags (vb.net or c#)

I've spent quite a bit of time trying to figure out the best way to handle this. I'm HTML encoding rich text from untrusted user input prior to storing it in the database.
I've bounce back and forth between multiple discussions, and it seems the safest method is to:
HTML encode absolutely everything, and only decode based on a white/safe list prior to sending it back to the client.
However, I'm also seeing strong suggestions for using http://htmlagilitypack.codeplex.com/
This compares user input against your safe/white list.
I've read:
C# HtmlDecode Specific tags only
https://eksith.wordpress.com/2011/06/14/whitelist-santize-htmlagilitypack/
And really, about 10 other posts and have become frustrated because now I can't figure out the best way to handle this.
I've tried using regular expressions to use regex replace methods:
For Each tag In AcceptableTags.Split(CChar("|")).ToList()
pattern = "<" + "\s*/?\s*" + tag + ".*?" + ">"
Regex = New Regex(pattern)
input = Regex.Replace(input, pattern)
Next
This doesn't seems to work well at all.
Is there someone out there who has a tried and true method with an example implementation they wouldn't mind sharing? I'll take c# or vb.net.
Depends on your data. Whitelist on the initial validation is fine if, for example, you're trying to avoid HTML in a phone number. On the other hand, if you can't be specific about what's in and what's out then just leave it "raw".
It's highly unlikely that storing encoded data in a database is the correct thing to do.
Any system of even marginal complexity will have non-HTML clients it will have to serve data to. When you do have an HTML client, you need to escape the output appropriate to HTML. Same for XML. Similarly, if you decide today you like JSON better, you'll encode to that. CSV? No problem - put quotes around your values (and escape any quotes) in case they have commas. Use parameters when doing SQL. Get the idea?
TL;DR;
Whitelist input if you can
Saving specifically encoded data is probably wrong
Always, always, always escape appropriate to your output
Never try and do your own escaping - always use a trusted library. You will never do a good enough job.

Validating your site

What else needs to be validated apart from what I have below? This is my question.
It is important that any input to a site is properly validated:
Textboxes, etc – use .NET validators (or custom code if the validators aren’t appropriate)
Querystring or Form values – use manual validation (casting to specific types, boundary checking, etc)
This ties into the problems which XSS can reveal.
Basically you have to validate any input that someone could potentially tamper with:
Form Postbacks (mainly .NET Controls – these can be validated with .NET validation controls. Also if you have Request Validation turned on on all pages, this reduces the risk )
QueryString Values
Cookie values
HTTP Headers
Viewstate (automatically done for you as long as you have ViewState MAC enabled)
Javascript (all JS can be viewed and changed, so need to ensure no crucial functionality is handled by JavaScript- i.e. always enable server side validation)
There is a lot that can go wrong with a web application. Your list is pretty comprehensive, although it is duplication. The http spec only states, GET, POST, Cookie and Header. There are many different types of POST, but its all in the same part of the request.
For your list I would also add everything having to do with file upload, which is a type of POST. For instance, file name, mime type and the contents of the file. I would fire up a network monitoring application like Wireshark and everything in the request should be considered potentially harmful.
There will never be a one size fits all validation function. If you are merging sql injection and xss sanitation functions then you maybe in trouble. I recommend testing your site using automation. A free service like Sitewatch or an open source tool like skipfish will detect methods of attack that you have missed.
Also, on a side note. Passing the view state around with a MAC and/or encrypted is a gross misuse of cryptography. Cryptography is tool used when there is no other solution. By using a MAC or encryption you are opening the door for an attacker to brute force this value or use something like oracle padding attack to take advantage of you. A view state should be kept track by the server, period end of story.
I would suggest a different way of looking at the problem that is orthogonal to what you have here (and hence not incompatible, there's no reason why you can't examine it both ways in case you catch with one what you miss with another).
The two things that are important in any validation are:
Things you pay attention to.
Things you pass to another layer untouched.
Now, most of the things you've mentioned so far fit into the first cateogry. Cookies that you ignore fit into the second, as would query & post information if you passed to another handler with Server.Execute or similar.
The second category is the most debatable.
On the one hand, if a given handler (.aspx page, IHttpHandler, etc.) ignores a cookie that may be used by another handler at some point in the future, it's mostly up to that other handler to validate it.
On the other hand, it's always good to have an approach that assumes other layers have security holes and you shouldn't trust them to be correct, even if you wrote them yourself (especially if you wrote them yourself!)
A middle-ground position, is that if there are perhaps 5 different states some persistant data could validly be in, but only 3 make sense when a particular piece of code is hit, it might verify that it is in one of those 3 states, even if that doesn't pose a risk to that particular code.
That done, we'll concentrate on the first category.
Querystrings, form-data, post-backs, headers and cookies all fall under the same category of stuff that came from the user (whether they know it or not). Indeed, they are sometimes different ways of looking at the same thing.
Of this, there is a subset that we will actually work upon in any way.
Of that there is a range of legal values for each such item.
Of that, there is a range of legal combinations of values for the items as a whole.
Validation therefore becomes a matter of:
Identify what input we will act upon.
Make sure that each component of that input is valid in its own right.
Make sure that the combinations are valid (e.g it may be valid to not send a credit card number, but invalid to not send one but set payment type to "credit card").
Now, when we come to this, it's generally best not to try to catch certain attacks. For example, it's not so good to avoid ' in values that will be passed to SQL. Rather, we have three possibilities:
It's invalid to have ' in the value because it doesn't belong there (e.g. a value that can only be "true" or "false", or from a set list of values in which none of them contain '). Here we catch the fact that it isn't in the set of legal values, and ignore the precise nature of the attack (thus being protected also from other attacks we don't even know about!).
It's valid as human input, but not as what we will use. An example here is a large number (in some cultures ' is used to separate thousands). Here we canonicalise both "123,456,789" and "123'456'789" to 123456789 and don't care what it was like before that, as long as we can meaningfully do so (the input wasn't "fish" or a number that is out of the range of legal values for the case in hand).
It's valid input. If your application blocks apostrophes in name fields in an attempt to block SQL-injection, then it's buggy because there are real names with apostrophes out there. In this case we consider "d'Eath" and "O'Grady" to be valid input and deal with the fact that ' is significant in SQL by escaping properly (ideally by using an API for data access that will do this for us.
A classic example of the third point with ASP.NET is code that blocks "suspicious" input with < and > - something that makes a great number of ASP.NET pages buggy. Granted, it's better to be buggy in blocking that inappropriately than buggy by accepting it inappropriately, but the defaults are for people who haven't thought about validation and trying to stop them from hurting themselves too badly. Since you are thinking about validation, you should consider whether it's appropriate to turn that automatic validation off and then treat < and > in a manner appropriate for your given use.
Note also that I haven't said anything about javascript. I don't validate javascript (unless perhaps I was actually receiving it), I ignore it. I pretend it doesn't exist and then I won't miss a case where its validation could be tampered with. Pretend yours doesn't exist at this layer too. Ultimately client-side validation is to save the good guys making honest mistakes time, not to twart the bad guys.
For similar reasons, this is best not tested through a browser. Use Fiddler to construct requests that hit the validation points you want to examine. This way all client-side validation is by-passed, and you're looking at the server the same way an attacker will.
Finally, remember that a page with 100% perfect validation is not necessarily secure. E.g. if your validation is perfect but your authentication poor then someone can send "valid" code to it that will be just - perhaps more - nasty as the more classic SQL-injection of XSS code. That hits onto other topics that are for other questions, except that validation as discussed here is only part of the puzzle.

Strategy for auto encoding text inputs?

To prevent my application from crashing with the error "A potentially dangerous Request.Form value was detected...", I just turned page validation off. I want to revisit this and solve it correctly.
Is there a good strategy for this? If people are entering '<' and '>', I think the only way to save their data is to encode it via Javacript. I have tried catching it in the code-behind, but it becomes too late. I am thinking of inheriting the textbox and auto encode/decode the input with client scripts. I also have to think of all the angle brackets that are already saved in my database.
Any suggestions or experience with this?
I get from your answer that you don't want your client to send you "dangerous" content, so its desirable to leave the page validation turned on, as a last line of defense, instead of turning it off and using Server.HtmlEncode on each user input value (you might miss one and it is a lot of work).
I would go for a javascript solution, for example you could use a library such as jQuery, and hook into the submit events of the forms, and tidy the input before submitting. Much cleaner than creating your own derived textbox.
For the users without javascript, or that try to "hack" your little script, sc#!w them, they will reach your last line of defense, and get an error.
It's best to think of the built-in page validation as a safety device that isn't applicable to all cases. There are more than a few times when it is completely impossible to do something with it turned on. In these cases we turn it off, and deal with the validation ourselves.
The most obvious case is that sometimes we actually do want to send big chunks of HTML to the server. Of course, doing so still has to be made secure, but "oh, that looks like a big chunk of HTML! throw a security exception!" obviously isn't the correct way to do that.
So, in these cases it's perfectly sensible to turn off page-validation and add your own server-side. It does mean that you have to think about just how this input will be used with a bit more scrutiny than before. Follow through the path of every datum input (not just those where you expect to see characters like <, and ensure that either it will never be sent back to the client unescaped, or that it is thoroughly inspected to guarantee safety.
You can escape dangerous chars before posting the data. Like this:
string = escape(string);
and then on the server side:
var stringVal = Server.UrlDecode(Request["string"]);
Something like that.
Have you considered using ,
Server.HtmlEncode(input)
There is no real need to do it in the client end using javascript. You can easily do it in the server side using the above technique.
And possibly be a duplicate of this question
/BB

Fortify and AntiXSS

My company requires our ASP.NET code to pass a Fortify 360 scan before releasing the code. We use AntiXSS everywhere to sanitize HTML output. We also validate input. Unfortunately, they recently changed the "template" Fortify was using and now it's flagging all our AntiXSS calls as "Poor Validation". These calls are doing things like AntiXSS.HTMLEncode(sEmailAddress).
Anyone know exactly what would satisfy Fortify? A lot of what it's flagging is output where the value comes from a database and never from a user at all, so if HTMLEncode isn't safe enough, we have no idea what is!
I'm a member of Fortify's Security Research Group and I'm sorry for the confusion this issue has been causing you. We haven't done a very good job of presenting this type of issue. I think part of the problem is the category name -- we're not trying to say that there is anything wrong with the validation mechanism, just that we cannot tell if it is the appropriate validation for this situation.
In other words, we don't know what the right validation is for your particular context. For this reason, we do not recognize any HTML encoding functions as validating against XSS out of the box, even the ones in Microsoft's AntiXSS library.
As for what the right solution is, if you are using HtmlEncode to output a username to the body of an HTML page, your original code is fine. If the the encoded username is being used in a URL, it could be vulnerable to XSS. At Fortify, when we find similar issues in our own code, if the validation matches the context, we mark it "Not an Issue".
We are aware of the problems around these issues keep tweaking our rules to make them more precise and understandable. We release new rules every three months and expect to make a couple changes in upcoming releases. For Q4, we plan to split the issues into Inadequate Validation (for blacklisting encoding and other weak validation schemes) and Context Sensitive Validation (the type of issue you're seeing). Please let us know if we can help more.
(A link to an OWASP explanation of why HTML encoding doesn't solve all problems:
http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#Why_Can.27t_I_Just_HTML_Entity_Encode_Untrusted_Data.3F)
fd_dev, I would add that you shouldn't focus on squeezing your code to fit through static analysis hoops. If you are qualified and confident that the finding doesn't apply, use the Fortify GUI tools to record a comment and suppress the issue.
If you are not sure, take a little screenshot and email it to Fortify Technical Support. They are well qualified to advise you on how to interpret your Fortify security findings.
blowdart is spot on. See http://www.schneier.com/blog/archives/2008/05/random_number_b.html for the worst case of what can happen if you chase static analysis results without understanding the purpose of the code and the reason/mechanics behind the finding. (In a word, you could make the code worse instead of better)-:
We've found a solution. Believe it or not, this causes Fortify360 to accept the code.
string sSafeVal = Regex.Replace(sValue, #"[\x00-\x1F\x7F]+", "");
Response.Write AntiXSS.HTMLEncode(sSafeVal);
So where AntiXSS.HTMLEncode alone fails, replacing non-printable characters works. Nevermind the fact that the HTMLEncode would do that anyways. I'm guessing they simply trigger off the Regex.Replace and I imagine any pattern would work.

Compressing parameters in the URL

The urls on my site can become very long, and its my understanding that urls are transmitted with the http requests. So the idea came to compress the string in the url.
From my searching on the internet, i found suggestions on using short urls and then link that one to the long url. Id prefer to not use this solution because I would have to do a extra database check to convert between long and short url.
That leaves in my head 3 options:
Hashing, I don't think this is a option. If you want a safe hashing algorithm, its going to be long.
Compressing the url string, basically having the server depress the string when when it gets the url parameters.
Changing the url so its not descriptive, this is bad because it would make development harder for me (This is a 1 man project).
Considering the vast amount possible amount of OS/browsers out there, I figured id as if anyone else has tried this or have some clever suggestions.
If it maters the url parameters can reach 100+ chars.
Example:
mysite.com/Reports/Ability.aspx?PlayerID=7737&GuildID=132&AbilityID=1140&EventID=1609&EncounterID=-1&ServerID=17&IsPlayer=True
EDIT:
Let me clarify atm this is NOT breaking the site. Its more about me learning to find a good solution ( Im well aware this is micro optimization, my site is very fast atm ) and making my site even faster ( To challenge myself, and become a better coder ).
There is also a cosmetic issue, I personal think that a URL longer then the address bar looks bad.
You have some conflicting requirements as you want to shorten/compress the url without making it less descriptive. By the very nature of shortening the URL, you will, to a certain extent, make it less descriptive.
As I understand it, your goal is to optimise by sending less over the request. You mention 100+ characters, instead of 1000+ which I assume means they don't get that big? In which case, I'd see this as an unnecessary micro-optimisation.
To add to previous suggestions of using POST, a simple thing would be to just shorten the keys instead of using full names if you don't want to do full url shortening e.g.:
mysite.com/Reports/Ability.aspx?pid=7737&GID=132&AID=1140&EID=1609&EnID=-1&SID=17&IsP=True
These are obviously less descriptive.
But like I said, are you having a real problem with having long URLs?
I'm not sure I understand what's your problem with long URLs? Generally I'd try to avoid them, but if that's necessary then you won't depend on the user remembering it anyway, so why go through all the compressing trouble? Even with a URL of 1000 chars (~2KB) the page request won't be slow.
I would, however, consider using POST instead of GET if possible, to prettify the URL, but that's of course depends on your implementation / environment.
It is recommended a few times here to use POST instead of GET. I would strongly recommend AGAINST picking your HTTP action by what the URL looks like. There is more to this choice than how it is displayed in the browser.
A quick overview:
http://www.w3.org/2001/tag/doc/whenToUseGet.html#checklist
A few options to add to the other answers:
Using a subclassed LinkButton for your navigation. This holds the extra data (PlayerId for example) inside its viewstate as properties. This won't be much help though if you're giving URLs to people via emails.
Use the MVC routing engine to produce slightly improved URLs - no keys for the querystring. e.g. mysite.com/Reports/Ability/7737/132/1140/1609/-1/17/True
Create your own URL shortener like tinyurl.com. Store the url in the database along with each of the querystring values to lookup.
Simply setup some friendly URLs for the most popular reports, for example mysite.com/Reports/JanuaryReport. You can also do this using the MVC routing engine.
The MVC routing engine is stand alone and can work without your site being an MVC site.
With my scheme, one could encode the params section of a URL as a base64 string which is ~50% shorter than a direct base64 representation. So for your case you get:
~50% shorter params section
a base 64 string which hides a lot of the detail
see http://blog.alivate.com.au/packed-url/
Most browsers can handle up 2048 characters in URL; if you don't feel like to use a long parameter list, you can always to pass parameters through POST requests.
There are theoretical problems with extended URLs. The exact limit varies across browser (roughly 2k in sort versions of IE) and server (4-8k in Apache, varying on version and configuration), and isn't officially specified in any RFC that I am aware of.
I would agree with synhershko, and replace the URL with form POST parameters instead if you are concerned that your URLs are growing too long.
I've encountered similar situations in the past, although my reasons for optimisation were for SEO. To me it depends on what you're doing with the page URL variables, are they being appended on all/most pages? If they are then to me there is almost always a much better way, although if you're far down the development path it's probably too late now.
I like being able to 'read' a URL, especially when I drop into an unknown site 2 or more layers deep in the navigation and there site is designed poorly, it's often the easiest and fastest way for an advanced user to find where they are on the site.
If you're interested in it from an SEO point of view, its normally best to have a hierarchy which only contains: / - _
Search engines will try and read URL's, see this video by Matt Cutts (can't remember how far into the video he mentions it but it's a good watch anyway...)
Any form of compression of the URL (hashing, compressing, non-descriptive) is going to:
make the urls harder to read, remember and type in correctly
have a performance impact as you will have to decrypt/decompress/convert the url before you can work with it.
Also, hashing is usually considered to be non-reversible - given a hashed value you shouldn't be able to work out what generated it, but you could use it to look up a value in a database, which gets you back to your first issue of short-long lookups.
You could easily just remove the redundant "ID" at the end of each parameter, and possibly strip out vowels or similar to "shorten" the url without losing too much from the semantics of the request.
But to be honest, the length of your URL is one of the least things to worry about in terms of performance - look at the size of any cookies you're sending back and forth between the browser and the server, and the page size you're sending back.

Categories