WebMatrix SQL injection & XSS concerns after paramaterizing queries - c#

I have a simple question that concerns security in WebMatrix razor (C#)
I (for a very brief period of time) considered using character checking to help validate forms to combat SQL injection, however, after realizing that parameterizing queries is the best way to combat this, I have since commented out this check which looks something like this:
foreach(char c in POIName)
{
if(c=='\'' || c=='$' || c=='\"' || c=='&' || c=='%' || c=='#' || c=='-' || c=='<' || c=='>')
{
errorMessage = "You have entered at least one invalid character in the \"POI Name\" field. Invalid characters are: [\'], [\"], [&], [$], [#], [-], [<], [>], and [%]";
}
}
Now, my question is this:
Should I remove lines like these entirely, as they are not needed (to allow maximum user freedom in textareas etc.) or should there still be some character checking specific to WebMatrix (or not specific to it for that matter).
Perhaps not allow "#" or ";" or "<" or ">"??
From what I understand paramaterizing the queries (as I have) auto escapes harmful characters and re-wraps the whole thing as a string (or something like that anyway, lol), and my database has proven impervious to any of the SQL injection attacks I have thrown at it.
I guess I am concerned because I am not sure if razor can be effected somehow or if I should worry about XSS (no I am not a blog or anything that should ever accept html tags or even angle brackets of any kind).
Sorry if this question is kind of all over the place, but I don't know where else to turn to find as definitive of an answer as Stack Overflow usually provides (e.g., I trust this community much more than a few Google searches, which I assure I have tried before posting).
Thanks for any help!

Ok, two points here. You are talking about engaging in "Black Listing" in order to scrub user input. The better approach is "White Listing". Define what the allowed characters are. There are many ways in which an attacker can creatively get around a quickly constructed Black List.
Secondly, XSS is a tricky beast. Encoding and White Listing can be used to combat it. I would recommend looking at Microsofts Ant-XSS Library for advice. But generally, you're going to want to use an encoding that's appropriate to the section of the page you are trying to protect.
In summary, the best advice I can give you is to become familiar with OWASP, specifically the OWASP Top 10. It's an amazing resource for securing your applications.

Related

natural language query processing

I have a NLP (natural language processing application) running that gives me a tree of the parsed sentence, the questions is then how should I proceed with that.
What is the time
\-SBAR - Suborginate clause
|-WHNP - Wh-noun phrase
| \-WP - Wh-pronoun
| \-What
\-S - Simple declarative clause
\-VP - Verb phrase
|-VBZ - Verb, 3rd person singular present
| \-is
\-NP - Noun phrase
|-DT - Determiner
| \-the
\-NN - Noun, singular or mass
\-time
the application has a build in javascript interpreter, and was trying to make the phrase in to a simple function such as
function getReply() {
return Resource.Time();
}
in basic terms, what = request = create function, is would be the returned object, and the time would reference the time, now it would be easy just to make a simple parser for that but then we also have what is the time now, or do you know what time it is. I need it to be able to be further developed based on the english language as the project will grow.
the source is C# .Net 4.5
thanks in advance.
As far as I can see, using dependency parse trees will be more helpful. Often, the number of ways a question is asked is limited (I mean statistically significant variations are limited ... there will probably be corner cases that people ordinarily do not use), and are expressed through words like who, what, when, where, why and how.
Dependency parsing will enable you to extract the nominal subject and the direct as well as indirect objects in a query. Typically, these will express the basic intent of the query. Consider the example of tow equivalent queries:
What is the time?
Do you know what the time is?
Their dependency parse structures are as follows:
root(ROOT-0, What-1)
cop(What-1, is-2)
det(time-4, the-3)
nsubj(What-1, time-4)
and
aux(know-3, Do-1)
nsubj(know-3, you-2)
root(ROOT-0, know-3)
dobj(is-7, what-4)
det(time-6, the-5)
nsubj(is-7, time-6)
ccomp(know-3, is-7)
Both are what-queries, and both contain "time" as a nominal subject. The latter also contains "you" as a nominal subject, but I think expressions like "do you know", "can you please tell me", etc. can be removed based on heuristics.
You will find the Stanford Parser helpful for this approach. They also have this online demo, if you want to see some more examples at work.

C# Ignore inside of speech, while changing other values

I am trying to make a basic translator, which changes values in the code, for example, a . may be ::, e.t.c, and I can do that by using
if(code.Contains("."))
{
code.Replace(".", "::");
}
But my problem is I don't know how to ignore it in the inside of speech, as if the sentence was "Hello.", it could be translated to "Hello::". How would I be able to stop this? (I know you can use Regex "\".+?\"" to find speech in text)
What you're trying to do here is vastly more complicated than you realise.
Programming languages differ in more than just appearance, they have different capabilities and syntax rules, even for two as similar as C# and C++. Aside from the fact that the C# . is equivalent to . -> and :: in C++. There's also different rules regarding pointers, and sometimes you get issues like having a pointer to a pointer, not to mention that the * and & symbols can be binary arithmetic/logic operations or pointer operations depending on their use. There's also issues involving keywords such as const, auto and sizeof.
In short, unless you're prepared to write a proper tokeniser, you aren't going to pull this off properly. To properly translate one programming language to another you would at least have to write a good chunk of a full compiler, which is a specialist subject.
I suggest you do some research into tokenisers and lexical analysis before you go any further.
As a hint though, you'll find it easier to split your code up into an array of characters and handle one character at a time whilst keeping track of the program's state (ie are you currently in the middle of a string, are you in the middle of two brackets, how many levels of nesting have you come across). By doing it that way you can at least manage to change the surface-level differences (as opposed to deeper ones like keywords and typing).
EDIT:
Some useful resources for writing tokenisers and compilers:
http://www.cs.man.ac.uk/~pjj/farrell/comp3.html
http://msdn.microsoft.com/en-us/library/vstudio/3yx2xe3h(v=vs.100).aspx
http://en.wikibooks.org/wiki/Compiler_Construction/Lexical_analysis
I speak from experience, as I did recently attempt to write my own compiler (in C#), but put the project on hold due to more important matters getting in the way.
You could probably use a regex to help you out here, but it would be fairly complex and perform poorly. You could also just do this:
var sb = new StringBuilder();
bool insideSpeech = false;
foreach(char c in code) {
if(c == '"') {
insideSpeech = !insideSpeech;
}
if(c == '.' && !insideSpeech) {
sb.Append("::");
} else {
sb.Append(c);
}
}
code = sb.ToString();

Data structure for searching strings

I am looking for the best data structure for the following case:
In my case I will have thousands of strings, however for this example I am gonna use two for obvious reasons. So let's say I have the strings "Water" and "Walter", what I need is when the letter "W" is entered both strings to be found, and when "Wat" is entered "Water" to be the only result. I did a research however I am still not quite sure which is the correct data structure for this case and I don't want to implement it if I am not sure as this will waste time. So basically what I am thinking right now is either "Trie" or "Suffix Tree". It seems that the "Trie" will do the trick but as I said I need to be sure. Additionally the implementation should not be a problem so I just need to know the correct structure. Also feel free to let me know if there is a better choice. As you can guess normal structures such as Dictionary/MultiDictionary would not work as that will be a memory killer. I am also planning to implement cache to limit the memory consumption. I am sorry there is no code but I hope I will get a answer. Thank you in advance.
You should user Trie. Tries are the foundation for one of the fastest known sorting algorithms (burstsort), it is also used for spell checking, and is used in applications that use text completion. You can see details here.
Practically, if you want to do auto suggest, then storing upto 3-4 chars should suffice.
I mean suggest as and when user types "a" or "ab" or "abc" and the moment he types "abcd" or more characters, you can use map.keys starting with "abcd" using c# language support lamda expressions.
Hence, I suggest, create a map like:
Map<char, <Map<char, Map<char, Set<string>>>>> map;
So, if user enters "a", you look for map[a] and finds all children.

How to centrally maintain a mathematical formula in C# (web) so it can be changed if needed?

We have an application that has a LOT of mathematical checks on the page and according to it, the user is given a traffic light (Red, green, yellow).
Green = He may continue
Red = Dont let him continue
Yellow = Allow to continue but warn
These formulas operate on the various text-fields on the page. So, for example, if textbox1 has "10" and texbox2 has "30"... The formula might be:
T1 * T2 > 600 ? "GREEN" : "RED"
My question is:
Is it possible to somehow centralize these formulas?
Why do I need it?
Right now, if there is any change in a formula, we have to replicate the change at server-side as well (violation of DRY, difficult to maintain code)
One option could be to
- store the (simple) formula as text with placeholders in a config(?)
- replace the placeholders with values in javascript as well as server-side code
- use eval() for computation in JS
- use tricks outlined here for C#
In this approach issue could be different interpretations of same mathematical string in JS and C#.
Am i making sense or should this question be reported?!! :P
Depending on your application's requirements, it may be acceptable to just do all the validation on the server. Particularly if you have few users or most of them are on a reasonably fast intranet, you can "waste" some network calls to save yourself a maintenance headache.
If the user wants feedback between every field entry (or every few entries, or every few seconds), you could use an AJAX call to ask the server for validation without a full page refresh.
This will, of course result in more requests than doing the validation entirely on the client, and if many of your users have bad network connections there could be latency in giving them the feedback. My guess is the total bandwidth usage is about the same. You use some for every validation round-trip, but those are small. It may be outweighed by all that validation JS that you're not going to send to clients.
The main benefit is the maintenance and FUD that you'd otherwise have keeping the client and server validation in sync. There's also the time savings in never having to write the validation javascript.
In any case, it may be worth taking a step back and asking what your requirements are.
The Microsoft.CSharp.CSharpCodeProvider provider can compile code on-the-fly. In particular, see CompileAssemblyFromFile.
This would allow you to execute code at runtime from a web.config for instance; however use with caution.
You could write C# classes to model your expressions with classes such as Expression, Value, BooleanExpr, etc. (an Abstract Syntax Tree)
e.g.
Expression resultExpression = new ValueOf("T1").Times(new ValueOf("T2")).GreaterThan(600).IfElse("RED","GREEN")
^Variable ^Variable ^Value=>BoolExpr ^(Value, Value) => Value
These expressions could then be used to evaluation in C# AND to emit Java script for the checks:
String result = resultExpression.bind("T1", 10).bind("T2",20).evaluate() //returns "RED"
String jsExpression resultExpression.toJavaScript // emits T1 * T2 > 600 ? "RED" : "GREEN"
You can make a low level calculator class that uses strings as input and pushes and pops things onto a stack. Look up a "Reverse Polish Calculator". If the number of inputs you are using doesn't change this would be a pretty slick way to store your equations. You would be able to store them in a text file or in a config file very easily.
for instance:
string Equation = "V1,V2,+";
string ParseThis = Equation.Replace("V1", "34").Replace("V2", "45");
foreach(string s in ParseThis.split(',')) {
if (s == "+") {
val1 = stack.Pop();
val2 = stack.Pop();
return int.parse(val1) + int.Parse(val2);
}
else {
stack.Push(s);
}
}
obviously this gets more complicated with different equations but it could allow you to store your equations as strings anywhere you want.
apologies if any of the syntax is incorrect but this should get you going in the right direction.
The simplest solution would be to implement the formulae once in C# server-side, and use AJAX to evaluate the expressions from the client when changes are made. This might slow down the page.
If you want the formulae evaluated client-side and server-side but written only once, then I think you will need to do something like:
Pull the formulae out into a separate class
For the client-side:
Compile the class to Javascript
Call into the javascript version, passing in the values from the DOM
Update the DOM using the results of the formulae
For the server-side:
Call into the formulae class, passing in the values from the form data (or controls if this is web-forms)
Take the necessary actions using the results of the formulae
.. or you could do the converse, and write the formulae in Javascript, and use a C# Javascript engine to evaluate that code server-side.
I wouldn't spend time writing a custom expression language, or writing code to translate C# to Javascript, as suggested by some of the other answers. As shown in the questions I linked to, these already exist. This is a distraction from your business problem. I would find an existing solution which fits and benefit from someone else's work.

SubSonic RESTHandler Question

I'm playing with the SubSonic RESTHandler for the first time and it's awesome... There is one quirk tho, that I'm curious about.
RESTHandler.cs (line 319):
//if this column is a string, by default do a fuzzy search
if(comp == Comparison.Like || column.IsString)
{
comp = Comparison.Like;
paramValue = String.Concat("%", paramValue, "%");
}
This little blurp of code forces all searches on string columns to wildcard searches by default. This seems counter intutive, since you've provided a nice set of comparisons we can add to a parameter (_is, _notequal, etc...). Is there a reason this was done? The EvalComparison uses "Comparison.Equals" as it's default, so unless a like is explicitly needed the " || column.IsString" looks like it should be removed since it breaks the ability to use different types of comparisons.
This was driving me crazy, since you can't do a "WHERE Field = X" without modifiying code...
Just curious if this is more of a feature than a bug...
Thanks!
Zach
It's because this is a LIKE operation which for a DB usually allows string operations. The feeling at the time was that if you wanted equals you could just use that.
It's been a while since I've touched this code - if you'd be kind enough to open a bug I'll take a look at it.
It does indeed look like a feature. It's based on the idea that, if I am searching for a string in a column without the wildcards, I must match the string exactly or I get no hits. I suspect that this was done to make programming search textboxes easier.

Categories