The backend is PostgreSQL server 9.1.
I am trying to build AdHoc XML reports. The report files will contain SQL queries, all of which must start with a SELECT statement. The SQL queries will have parameters. Depending upon the data type of the associated columns, these parameters will be accordingly presented to the user to provide value(s).
A rought SQL query:
SELECT * FROM customers
WHERE
(
customers.customer_code=#customer_code AND customers.location=#location
AND customers.type=
(
SELECT type from types
WHERE types.code=#type_code
AND types.is_active = #type_is_active
)
AND customers.account_open_date BETWEEN #start_date AND #end_date
)
OR customers.flagged = #flagged;
I want to get list of the column names and parameters from the query string and put them into a string array and process later.
I am able to match only the parameters using the following regular expression:
#(?)(?<parameter>\w+)
Expected Matches:
customers.customer_code=#customer_code
customers.location=#location
types.code=#type_code
types.is_active = #type_is_active
customers.account_open_date BETWEEN #start_date AND #end_date
customers.flagged = #flagged
How to to match "#Parameter", "=", and "BETWEEN" right away?
I know it's a little late, but for future researches' sake:
I think this Regex serves your purpose:
(\w+\.\w+(?:\s?\=\s?\#\w+|\sBETWEEN\s\#\w+\sAND\s\#\w+))
Check this Regex101 fiddle here, and read carefully the explanation for each part of it.
Basically, it first looks for your customer.xxx_yyy columns, and then either for = #variable or BETWEEN #variable1 AND #variable2.
Captured groups:
MATCH 1
1. [37-75]
`customers.customer_code=#customer_code`
MATCH 2
1. [80-108]
`customers.location=#location`
MATCH 3
1. [184-205]
`types.code=#type_code`
MATCH 4
1. [218-251]
`types.is_active = #type_is_active`
MATCH 5
1. [266-327]
`customers.account_open_date BETWEEN #start_date AND #end_date`
MATCH 6
1. [333-361]
`customers.flagged = #flagged`
Related
I want to create a unique small string <= 258 chars that is suitable as a windows filename.
This is to uniquely label a Xml query result.
Here is a sample query:
SELECT * FROM ( SELECT [utcDT],
MAX(CASE WHEN[Symbol] = 'fish' THEN[Close] END) AS [fish],
MAX(CASE WHEN[Symbol] = 'chips' THEN[Close] END) AS [chips]
FROM [DATA].[1M].[ASTS_NOGAP]
WHERE [Date] >= '2011-12-27'
AND [Date] <= '2012-07-01'
AND [Symbol] IN ('fish','chips')
GROUP BY [utcDT] ) AS A
WHERE [utcDT] IS NOT NULL AND [fish] IS NOT NULL AND [chips] IS NOT NULL
ORDER BY [utcDT]
BUT is could be a longer query.
The compress is one way only, i.e. I do NOT need to decompress.
I want to end up with a unique file name like:
ksdgfsbhdfjksgdjbajysjdgyasagfdjahgdkjasgjgfjkgjkgdjkfgjskdjfgsajgdjfgjsgy.xml
EDIT1:
The generated filename must be unique to the query - such that another
app would generate the same filename for the same query.
How can I achieve this?
There is a small risk for collisions, but this should do what you need:
public string GetUniqueFileNameForQuery(string sql)
{
using (var hasher = SHA256.Create())
{
var queryBytes = Encoding.UTF8.GetBytes(sql);
var queryHash = hasher.ComputeHash(queryBytes);
// "/" may be included, but is not legal for file names
return Convert.ToBase64String(queryHash).Replace("/", "-")+".xml";
}
}
This needs using System.Security.Cryptography; at the top of the file.
I also need to add a note about working with SQL from client code languages like C#.
Most queries are going to need input of some kind: an ID field for a lookup, a date range, a username, something to tell the query which records you need out of a larger set. It's very poor practice to substitute these inputs directly into the SQL string in your C# (or other language) code. That opens you up to an issue known as SQL Injection, and it's kind of a big deal.
Instead, for most all queries, there will be a placeholder variable name for each input argument. It matters for this question because you'll have the same SQL query text for two queries that differ only by arguments.
For example, say you have this query:
SELECT * FROM Users WHERE Username = #Username
You run this query twice, once with 'jsmith' as the input, and once with 'jdoe'. The SQL didn't change, and therefore the encoded file name didn't change.
You maybe be inclined to ask to get the value of the SQL after the parameter inputs are substituted into the query, but this misunderstands what happens. The parameter inputs are never, at any time, substituted into the sql query. That's the whole point. Even the database server will instead treat them as procedure variables.
The point here is you also need a way to encode any parameter data used with your query. Here's one basic naive option:
public string GetUniqueFileNameForQuery(DbCommand query)
{
var sql = query.CommandText;
foreach(var p in query.Parameters)
{
sql = sql.Replace(p.Name, p.Value.ToString());
}
using (var hasher = SHA256.Create())
{
var queryBytes = Encoding.UTF8.GetBytes(sql);
var queryHash = hasher.ComputeHash(queryBytes);
// "/" may be included, but is not legal for file names
return Convert.ToBase64String(queryHash).Replace("/", "-")+".xml";
}
}
Note: this code could produce invalid SQL. For example, you might end up with something like this:
SELECT * FROM Users WHERE LastName = O'Brien
But since you're not actually trying to run the query, that should be okay. You also need to be careful with systems like OleDB, which uses positional matching and ? for all parameter placeholders. In this case, the parameter name won't match the placeholder, or even if it did, the first parameter would match the placeholder for all the others.
Alright, the system I got is a pretty outdated ERP system based around an Ingres database. The database schema is ... well ... not very nice (not really normalized) but basically it works out. Please understand that I cannot change anything related to the database.
Consider the following SQL statement:
SELECT
-- some selected fields here
FROM
sta_artikelstamm s
left join sta_chargen c on c.artikel_nr = s.artikel_nr and c.lager != 93
left join sta_artikelbeschreib b on s.artikel_nr = b.artikel_nr and b.seite = 25 and b.zeilennr = 1
left join sta_einkaufskonditionen ek on s.artikel_nr = ek.artikel_nr AND s.lieferant_1 = ek.kunden_nr
left join sta_kundenstamm ks on ek.kunden_nr = ks.nummer AND ks.nummer = s.lieferant_1
left join tab_teilegruppe2 tg2 on s.teilegruppe_2 = tg2.teilegruppe
WHERE
(s.status = 0)
AND
(s.teilegruppe_2 IS NOT NULL) AND (s.teilegruppe_2 != '')
So far, this works as expected, I get exactely 40742 results back. The result set looks alright, the number matches about what I would expect and the statement has shown no duplicates. I explicitly use a LEFT JOIN since some fields in related tables may not contain entries but I would like to keep the info from the main article table nonetheless.
Now, table tab_teilegruppe2 consists of 3 fields (bezeichnung = description, teilegruppe = part group == primary key, taricnr - please ignore this field, it may be null or contain some values but I don't need it).
I though of adding the following SQL part to only include rows in the resultset which do NOT appear in a specific part group. I therefore added the following line at the very end of the SQL statement.
AND (s.teilegruppe_2 NOT IN (49,57,60,63,64,65,66,68,71,73,76,77,78,79,106,107))
I'm by no means an SQL expert (you probably have guessed that already), but shouldn't an additional WHERE statement remove rows instead of adding? As soon as I add this simple additional statement in the WHERE clause, I get 85170 result rows.
Now I'm guessing it has to do with the "NOT IN" statement, but I don't understand why I suddenly get more rows than before. Anyone can give me a pointer where to look for my error?
What is the type of the s.teilegruppe_2 column? Is it an integer or some sort of string (VARCHAR)?
The (s.teilegruppe_2 != '') suggests it is a string but your NOT IN is comparing it against a list of integers.
If the column involved is a string then the NOT IN list will match all the values since none of them are going to match an integer value.
I was recently asked to migrate our MSSQL database to an Oracle one.
I'm using the old-traditional way to execute sql queries.
for some reason, unknown to me, Oracle requires me to put parentheses around column names (why?)
Is there a workaround for this?
The following code will fail because of the parentheses (used to work well under MSSQL)
using (var msq = new OracleConnection(sConnectionString))
{
msq.Open();
OracleCommand msc = msq.CreateCommand();
msc.CommandText = #"SELECT level_1,element_id FROM tnuot_menu_tree
WHERE level_1 IN
(SELECT mt.level_1 FROM tnuot_menu_tree mt
WHERE mt.element_id IN
(SELECT element_tree_id FROM tnuot_menu_elements
WHERE UPPER(element_link) LIKE :url))
AND level_2 = 0 AND level_3 = 0";
msc.Parameters.Add("url", SqlDbType.VarChar);
msc.Parameters["url"].Value = "%" + sName.ToUpper();
OracleDataReader mrdr = msc.ExecuteReader();
while (mrdr.Read())
{
sResult.arDirectResult.Add(mrdr[0].ToString());
sResult.arDirectResult.Add(mrdr[1].ToString());
break;
}
msc.Dispose();
mrdr.Dispose();
msq.Close();
}
Instead, in the VS server explorer, the last query gets 'translated' to
SELECT "level_1", "element_id"
FROM "tnuot_menu_tree"
WHERE ("level_1" IN
(SELECT "level_1" FROM "tnuot_menu_tree" mt
WHERE ("element_id" IN
(SELECT "element_tree_id" FROM "tnuot_menu_elements"
WHERE (UPPER("element_link") LIKE '%DEFAULT.ASPX')))))
AND ("level_2" = 0) AND ("level_3" = 0)
Which works well.
Any ideas on how to get rid of this nasty task?
Possibly, it isn't the brackets that are necessary; it's the double quotes. This is Oracle's equivalent of SQLServer's use of square brackets - it may be necessary here because the tables have been created with lower-case names, but without the double quotes Oracle automatically converts names to upper-case.
The main difference between your first and second query are the quotes (and not the parentheses). The additional parentheses aren't needed. They seem to be a strange artifact of the VS server explorer.
Contrary to popular belief, Oracle is case-sensitive. The column names level_1 and LEVEL_1 are different. If your column and table names are all upper-case, case won't matter because Oracle converts all unquoted identifiers in SQL statements to upper-case.
But if your column and tables names use lower case letters, you must put the column names in double quotes to have the proper casing retained.
I have the following query:
;WITH valRules AS
( SELECT vr.valRuleID, Count(*) AS totalRows, Sum(vt.test) AS validRows
FROM (SELECT NULL AS x) AS x
JOIN #itemMap AS IM
ON IM.lngitemID = 1
JOIN tblValidationRule AS vr
ON IM.RuleID = vr.valRuleID
JOIN tblValidationRuleDetl AS vrd
ON vr.valRuleID = vrd.valRuleID
LEFT JOIN #ValTest AS vt
ON vrd.type = vt.type
AND vrd.typeSequence = vt.typeSequence
AND vrd.valRule & vt.Response > 0
OR (vrd.valrule = 0 AND vt.response = 0 )
GROUP BY vr.valRuleID
)
SELECT Count(*)
FROM valrules
WHERE totalrows = validRows
Note the CTE, and the Bitwise Operator in the Left Join Condition. How this is currently used is in a stored procedure that takes values from a C# application in the form of an XML variable. The XML Variable is placed into table #valTest. All columns are of datatype INT. If vt.Response is valid for vaRule, the result of & will be greater than zero. (i.e. 31 & 8 = 8 but 12 & 2 = 0). vt.Test column contains the number 1 for each row, so that it may be summed up (nulls are automatically excluded) to get a count of the validations that pass by rule. Each rule has a number of attributes that must pass validation for success. If the number of attributes is equal to those that passed, we have success.
In an effort to reduce calls to the database, the goal is to cache ALL the rules in the ASP.NET cache and handle validation localy. The developers are asking for a de-normalized version of the validation data with the claim that the SQL Set based operation is not a simple task in C# with Linq. From what I have looked into, I would agree. At this point my investigation shows the bitwise comparison in the join condition is particularly problematic.
The main question is how is can this be converted to something that uses Linq on the C# side? Or, are there more efficient ways to deal with this on the client side and Linq is not one of them (i.e. just give them flat data)?
thanks
LINQ-to-SQL isn't going to do anything quite as bespoke as that query. Which isn't a criticism of either LINQ-to-SQL or the query: simply, there are limits.
There are two ways I would approach that:
1: as a parameterized TSQL query via ExecuteQuery<T> - i.e.
var result = db.ExecuteQuery<YourType>(#"your query here with {0}, {1} etc",
arg0, arg1, ...);
2: write that TSQL a udf mapped into the data-context:
var result = db.YourUdf(arg0, ...);
Both are valid and will work with LINQ-to-SQL; personally I prefer the first approach, but the UDF approach allows greater re-use within the DB layer, at the expense of having more complex deployment (i.e. app tier and db tier all at the same time).
What's the best way to convert search terms entered by a user, into a query that can be used in a where clause for full-text searching to query a table and get back relevant results? For example, the following query entered by the user:
+"e-mail" +attachment -"word document" -"e-learning"
Should translate into something like:
SELECT * FROM MyTable WHERE (CONTAINS(*, '"e-mail"')) AND (CONTAINS(*, '"attachment"')) AND (NOT CONTAINS(*, '"word document"')) AND (NOT CONTAINS(*, '"e-learning"'))
I'm using a query parser class at the moment, which parses the query entered by users into tokens using a regular expression, and then constructs the where clause from the tokens.
However, given that this is probably a common requirement by a lot of systems using full-text search, I'm curious as to how other developers have approached this problem, and whether there's a better way of doing things.
How to implement the accepted answer using .Net / C# / Entity Framework...
Install Irony using nuget.
Add the sample class from:
http://irony.codeplex.com/SourceControl/latest#Irony.Samples/FullTextSearchQueryConverter/SearchGrammar.cs
Write code like this to convert the user-entered string to a query.
var grammar = new Irony.Samples.FullTextSearch.SearchGrammar();
var parser = new Irony.Parsing.Parser(grammar);
var parseTree = parser.Parse(userEnteredSearchString);
string query = Irony.Samples.FullTextSearch.SearchGrammar.ConvertQuery(parseTree.Root);
Perhaps write a stored procedure like this:
create procedure [dbo].[SearchLivingFish]
#Query nvarchar(2000)
as
select *
from Fish
inner join containstable(Fish, *, #Query, 100) as ft
on ft.[Key] = FishId
where IsLiving = 1
order by rank desc
Run the query.
var fishes = db.SearchLivingFish(query);
This may not be exactly what you are looking for but it may offer you some further ideas.
http://www.sqlservercentral.com/articles/Full-Text+Search+(2008)/64248/
In addition to #franzo's answer above you probably also want to change the default stop word behaviour in SQL. Otherwise queries containing single digit numbers (or other stop words) will not return any results.
Either disable stop words, create your own stop word list and/or set noise words to be transformed as explained in SQL 2008: Turn off Stop Words for Full Text Search Query
To view the system list of (English) sql stop words, run:
select * from sys.fulltext_system_stopwords where language_id = 1033
I realize it's a bit of a side-step from your original question, but have you considered moving away from SQL fulltext indexes and using something like Lucene/Solr instead?
The easiest way to do this is to use dynamic SQL (I know, insert security issues here) and break the phrase into a correctly formatted string.
You can use a function to break the phrase into a table variable that you can use to create the new string.
A combination of GoldParser and Calitha should sort you out here.
This article: http://www.15seconds.com/issue/070719.htm has a googleToSql class as well, which does some of the translation for you.