Regex to Extract Update Columns from a Sql Statement

Regex to Extract Update Columns from a Sql Statement - c#

I need a Regex Statement (run in c#) that will take a string containing a Sql Update statement as input, and will return a list of columns to be updated. It should be able to handle columns surrounded by brackets or not.
// Example Sql Statement
Update Employees
Set FirstName = 'Jim', [LastName] = 'Smith', CodeNum = codes.Num
From Employees as em
Join CodeNumbers as codes on codes.EmployeeID = em.EmployeeID
In the end I would want to return an IEnumerable or List containing:
FirstName
LastName
CodeNum
Anyone have any good suggestions on implementation?
Update: The sql is user-generated, so I have to parse the Sql as it is given. The purpose of extracting the column names in my case is to validate that the user has permission to update the columns included in the query.

You're doing it backwards. Store the data in a broken out form, with the table to be updated, the column names, and the expressions to generate the new values all separate. From this canonical representation, generate both the SQL (when you need it) and the list of columns being updated (when you need that instead).
If you absolutely must pull the column names out of a SQL statement, I don't think that regular expressions are the correct way to go. For example, in the general case you may need to skip over new value expressions that contain arbitrarily nested parenthesis. You will probably want a full SQL parser. The book Lex & Yacc by Levine, Mason, and Brown has a chapter on parsing SQL.
Response to update:
You are in for a world of hurt. The only way to do what you want is to fully parse the SQL, because you also need to make sure that you don't have any subexpressions that perform unauthorized actions.
I very, very strongly recommend that you come up with another way to do whatever it is that you are doing. Maybe break out the modifiable fields into a separate table and use access controls? Maybe come up with another interface for them to use in specifying what they want done? Whatever it is that you're doing, there is almost certainly a better way to do it. Down that path there be dragons.

Regular expressions cannot do this task, because SQL is not a regular language.
You can do this, but not with a regular expression. You need a full-blown parser.
You can use ANTLR to generate parsers in C#, and there are free grammars available for parsing SQL in ANTLR.
However, I agree with Glomek that allowing user-supplied SQL to be run against your system, even after you have tried to validate that it includes no "unauthorized actions," is foolish. There are too many cases that may circumvent your validation.
Instead, if you have only a single text field, you should define a simplified Domain-Specific Language that permits users to specify only actions that they are authorized to do. From this input, you can build the SQL yourself.

SQL has a complex recursive grammer, and, there will always be some sub select, group by, or literal that will break your regex based parser.
Why don't use a sql parser to achieve what you need, here is an article shows you how to achieve what you need within 3 minutes.

Related

Secure full where statement when it comes as a string

I have the following C# function
SomeFunction(string table, string column, string where) {
Sql sql = new Sql("SELECT ");
// [...] validate table and column values
sql.Append(column);
sql.Append(" FROM ");
sql.Append(table);
sql.Append(" WHERE ");
sql.Append(where); // This is the issue
}
As you can see this is awful, I'm dealing with this very old legacy code and changing the function signature and the way the clients use it is just not feasible. What I have to do is secure the 'where' clause. This clause may contain any number of conditions and data types.
I had a bunch of ideas but I don't think they are a good solution, I think this requires a properly written and tested code, but if I do it myself out of the blue it'll probably have holes. Here are some thoughts:
Splitting the string by char '=' -> what if that's not the condition operator
Find if string contains semicolons -> the SELECT clause remains vulnerable, and maybe one of the conditions contains that char so it'd give a false positive
If you have any idea/suggestion/pointing in the right direction I will be most grateful.

If the where clause is currently based on being a pre-composed string, then frankly I don't think it is a viable approach to attempt to "secure" it. It is theoretically possible, but any attempt at parsing the SQL will fail if the composed and compromised (injected) where clause is legitimate (but abusive). At that point: you've already lost track of the original intent. That's kinda the entire point of SQL injection: the resultant SQL is valid SQL - so it is very hard for you to tell the difference between where Name = 'Fred Orson' -- check name (probably fine) and where Name = 'Fred' Or 1=1 --' (injected - query widening).
So: while I acknowledge that you say:
changing the function signature and the way the clients use it is just not feasible.
Not changing the function signature doesn't really help you solve the problem. Trying to detect certain patterns is just an arms race, where you need to win every time and the attacker needs to win only once.
If it was me, I'd be doing something like:
[Obsolete("Please specify parameters separately - use 'null' if no parameters are needed")]
SomeFunction(string table, string column, string where) {
return SomeFunction(table, column, where, null);
}
SomeFunction(string table, string column, string where, object args) {
// ...
}
and using an approach like "Dapper" uses to compose the parameters from the args parameter - or just use "Dapper" itself to run the query, and use that functionality for free.
This approach:
prevents new uses of the dangerous API being added
lets the existing code continue to work for now
but lets you track how many outstanding problem calls there are, by watching the warnings
Edit: note: the point of the args parameter is to allow the caller to parameterize their inputs, i.e.
string name = ...
var users = SomeFunction("Users", "Id", "Name=#name", new { name });
With SomeFunction decomposing args and adding parameter name/value pairs from the properties on args (if it is non-null). There are various approaches to composing parameter sets, but the approach shown here is simple and easy to implement correctly - which makes it a clear win for me.

Safe way to allow user generated expressions against SQL columns

I am allowing users to generate expressions against predefined columns on the table. A user can create columns, tables, and can define constraints such as unique and not null columns. I also want to allow them to generate "Calculated columns". I am aware that PostgreSQL does not allow calculated columns so to get around that I'll use expressions like this:
SELECT CarPrice, TaxRate, CarPrice + (CarPrice * TaxRate) AS FullPrice FROM CarMSRP
The user can enter something like this
{{CarPrice}} + ({{CarPrice}} * {{TaxRate}})
Then it gets translated to
CarPrice + (CarPrice * TaxRate)
Not sure if this is vulnerable to sql injection. If so, how would I make this secure?

Why don't you utilize STORED PROCEDURES to conduct this?
This way, you can, for instance, define variables to receive what user wrote and check if there are some BLACKLISTED words (like DELETE, TRUNCATE, ALL, *, and so forth).
I don't know PostgreSQL, but if it's not possible there, you can also check those problematic commands BEFORE translate them to call your SELECT statement.

If I understand you correctly, you just take user input as desribed above and substitute in select column list. If so, that is sure not safe, because something like:
"* from SomeSystemTable--({{CarPrice}} + ({{CarPrice}} * {{TaxRate}})"
Will allow user to select anything from any other tables he has permissions for. You can try to build expression tree to avoid that: parse user input into some structure describing variables and arithmetic operations between them (like parsing arithmetic expressions). Otherwise you can remove all {{}} from your string (ensure that any {{}} corresponds to a column in a table) and check if only "+-*()" and whitespace characters left.
Note that from user experience viewpoint you will need to parse expression anyway, to warn user about errors without actually running the query.

Lucene empty query string with filter

I am using Lucene.Net in a personal project and need to handle to cases but can't find a nice way that lucene will handle the two cases using the same type of query.
The basic query uses a MultiFieldQueryParser with the StandardAnalyzer and a NumericRangedFilter to filter by date (dates are saved as long values).
The problem being that I would like the filter to handle an empty search string, without having to use two different query parsers, one for an empty search string and one where the user enters a search string. Currently the MultiFieldQueryParser throws a ParseException when an empty string is used.
Any advice on the best way to handle this? Or is this a flaw (intentional or otherwise) in Lucene or Lucene.Net.
RESULT
I ended up using the MatchAllDocsQuery if the query string was empty with a normal query otherwise.
Also I had to remove the use of NumericFields and the NumericRangeFilter as the query returned no results when I used them. I ended up doing the date range filter the old way with strings and a normal RangeFilter.

The best way to handle it is to generate a MatchAllDocsQuery and bypass the parser if the input is an empty string.
http://lucene.apache.org/core/old_versioned_docs/versions/2_9_4/api/all/org/apache/lucene/search/MatchAllDocsQuery.html

Escaping various characters in C# SQL from a variable

I'm working a C# form application that ties into an access database. Part of this database is outside of my control, specifically a part that contains strings with ", ), and other such characters. Needless to say, this is mucking up some queries as I need to use that column to select other pieces of data. This is just a desktop form application and the issue lies in an exporter function, so there's no concern over SQL injection or other such things. How do I tell this thing to ignore quotes and such in a query when I'm using a variable that may contain them and match that to what is stored in the Access database?
Well, an example would be that I've extracted several columns from a single row. One of them might be something like:
large (3-1/16" dia)
You get the idea. The quotes are breaking the query. I'm currently using OleDb to dig into the database and didn't have an issue until now. I'd rather not gut what I've currently done if it can be helped, at least not until I'm ready for a proper refactor.

This is actually not as big problem as you may see it: just do NOT handle SQL queries by building them as plain strings. Use SqlCommand class and use query parameters. This way, the SQL engine will escape everything properly for you, because it will know what is the code to be read directly, and what is the parameter's value to be escaped.

You are trying to protect against a SQL Inject attack; see https://www.owasp.org/index.php/SQL_Injection.
The easiest way to prevent these attacks is to use query parameters; http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlparameter.aspx
var cmd = new SqlCommand("select * from someTable where id = #id");
cmd.Parameters.Add("#id", SqlDbType.Int).Value = theID;

At least for single quotes, adding another quote seems to work: '' becomes '.
Even though injection shouldn't be an issue, I would still look into using parameters. They are the simpler option at the end of the day as they avoid a number of unforeseen problems, injection being only one of them.

So as I read your question, you are building up a query as a string in C#, concatenating already queried column values, and the resulting string is either ceasing to be a string in C#, or it won't match stuff in the access db.
If the problem is in C#, I guess you'll need some sort of escaping function like
stringvar += escaped(columnvalue)
...
private static void escaped(string cv) as string {
//code to put \ in front of problem characters in cv
}
If the problem is in access, then
' escapes '
" escapes "
& you can put a column value containing " inside of '...' and it should work.
However my real thought is that, the SQL you're trying to run might be better restructured to use subqueries to get the matched value(s) and then you're simply comparing column name with column name.
If you post some more information re exactly what the query you're producing is, and some hint of the table structures, I'll try and help further - or someone else is bound to be able to give you something constructive (though you may need to adjust it per Jet SQL syntax)

How to fetch entries starting with the given string from a SQL Server database?

I have a database with a lot of words to be used in a tag system. I have created the necessary code for an autocomplete box, but I am not sure of how to fetch the matching entries from the database in the most efficient way.
I know of the LIKE command, but it seems to me that it is more of an EQUAL command. I get only the words that looks exactly like the word I enter.
My plan is to read every row, and then use C#'s string.StartsWith() and string.Contains() functions to find words that may fit, but I am thinking that with a large database, it may be inefficient to read every row and then filter them.
Is there a way to read only rows that starts with or contains a given string from SQL Server?

When using like, you provide a % sign as a wildcard. If you want strings that start with Hello, you would use LIKE 'Hello%' If you wanted strings with Hello anywhere in the string, you would use LIKE '%Hello%'
As for efficiency, using Like is not optimal. You should look into full text search.

I know of the LIKE command, but it seems to me that it is more of an EQUAL command. I get only the words that looks exactly like the word I enter.
That's because you aren't using wildcards:
WHERE column LIKE 'abc%'
...will return rows where the column value starts with "abc". I'll point out that when using wildcards, this is the only version that can make use of an index on the column... er column.
WHERE column LIKE '%abc%'
...will return rows where the column value contains "abc" anywhere in it. Wildcarding the left side of a LIKE guarantees that an index can not be used.
SQL Server doesn't natively support regular expressions - you have to use CLR functions to gain access to the functionality. But it performs on par with LIKE.
Full Text Search (FTS) is the best means of searching text.

You can also implement a StartWith functionality using the following statements:
LEFT('String in wich you search', X) = 'abc'
CHARINDEX('abc', 'String in wich you search') = 1
'String in wich you search' LIKE 'abc%'
Use the one wich performs best.

You can use CONTAINS in T-SQL, but I'm pretty sure you have to have to be using full-text indexing for the table involved in your query.
Contains
Getting started with Full-Text Search

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex to Extract Update Columns from a Sql Statement - c#

SQL has a complex recursive grammer, and, there will always be some sub select, group by, or literal that will break your regex based parser. Why don't use a sql parser to achieve what you need, here is an article shows you how to achieve what you need within 3 minutes.

Related

Secure full where statement when it comes as a string

Safe way to allow user generated expressions against SQL columns

Lucene empty query string with filter

Escaping various characters in C# SQL from a variable

How to fetch entries starting with the given string from a SQL Server database?

Categories

Resources