Lucene empty query string with filter - c#

I am using Lucene.Net in a personal project and need to handle to cases but can't find a nice way that lucene will handle the two cases using the same type of query.
The basic query uses a MultiFieldQueryParser with the StandardAnalyzer and a NumericRangedFilter to filter by date (dates are saved as long values).
The problem being that I would like the filter to handle an empty search string, without having to use two different query parsers, one for an empty search string and one where the user enters a search string. Currently the MultiFieldQueryParser throws a ParseException when an empty string is used.
Any advice on the best way to handle this? Or is this a flaw (intentional or otherwise) in Lucene or Lucene.Net.
RESULT
I ended up using the MatchAllDocsQuery if the query string was empty with a normal query otherwise.
Also I had to remove the use of NumericFields and the NumericRangeFilter as the query returned no results when I used them. I ended up doing the date range filter the old way with strings and a normal RangeFilter.

The best way to handle it is to generate a MatchAllDocsQuery and bypass the parser if the input is an empty string.
http://lucene.apache.org/core/old_versioned_docs/versions/2_9_4/api/all/org/apache/lucene/search/MatchAllDocsQuery.html

Related

How can I check if date falls between two dates in mongoDB using C#?

I have a list stored in MongoDB that contains objects. Each object has a start date and an end date.
When I insert a new object into a list, I check whether it exists in the collection by using his ID.
If the object exists, I want to check that the dates do not overlap...how do i check it?
Thanks
I'm not sure what you have tried so far but I don't want to outright give you all the code especially since I don't know what you have done.
The.find() query will return all documents a collection and returns all fields from the documents. The documentation is here In addition to this I would probably use the $exists keyword within MongoDB in order to test the collection.
After the .find query finds the dates and you filter the query then we can use the $lte and $gte to filter through the dates and get them in a certain sequence so we can see which dates are between which range. The $lte and $gte syntax words are hyperlinked for you to check out the documentation.
If this doesn't help I also think this could be a duplicate of the following question:MongoDB_Possible_Duplicate

Escaping various characters in C# SQL from a variable

I'm working a C# form application that ties into an access database. Part of this database is outside of my control, specifically a part that contains strings with ", ), and other such characters. Needless to say, this is mucking up some queries as I need to use that column to select other pieces of data. This is just a desktop form application and the issue lies in an exporter function, so there's no concern over SQL injection or other such things. How do I tell this thing to ignore quotes and such in a query when I'm using a variable that may contain them and match that to what is stored in the Access database?
Well, an example would be that I've extracted several columns from a single row. One of them might be something like:
large (3-1/16" dia)
You get the idea. The quotes are breaking the query. I'm currently using OleDb to dig into the database and didn't have an issue until now. I'd rather not gut what I've currently done if it can be helped, at least not until I'm ready for a proper refactor.
This is actually not as big problem as you may see it: just do NOT handle SQL queries by building them as plain strings. Use SqlCommand class and use query parameters. This way, the SQL engine will escape everything properly for you, because it will know what is the code to be read directly, and what is the parameter's value to be escaped.
You are trying to protect against a SQL Inject attack; see https://www.owasp.org/index.php/SQL_Injection.
The easiest way to prevent these attacks is to use query parameters; http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlparameter.aspx
var cmd = new SqlCommand("select * from someTable where id = #id");
cmd.Parameters.Add("#id", SqlDbType.Int).Value = theID;
At least for single quotes, adding another quote seems to work: '' becomes '.
Even though injection shouldn't be an issue, I would still look into using parameters. They are the simpler option at the end of the day as they avoid a number of unforeseen problems, injection being only one of them.
So as I read your question, you are building up a query as a string in C#, concatenating already queried column values, and the resulting string is either ceasing to be a string in C#, or it won't match stuff in the access db.
If the problem is in C#, I guess you'll need some sort of escaping function like
stringvar += escaped(columnvalue)
...
private static void escaped(string cv) as string {
//code to put \ in front of problem characters in cv
}
If the problem is in access, then
' escapes '
" escapes "
& you can put a column value containing " inside of '...' and it should work.
However my real thought is that, the SQL you're trying to run might be better restructured to use subqueries to get the matched value(s) and then you're simply comparing column name with column name.
If you post some more information re exactly what the query you're producing is, and some hint of the table structures, I'll try and help further - or someone else is bound to be able to give you something constructive (though you may need to adjust it per Jet SQL syntax)

How can I sort an SQLite query ignoring articles ("the", "a", etc.)?

I'm using C# to display a list of movie titles that I am calling from an SQLite database. Currently, I'm using a custom ListBox class that has a function to sort the text stripping the word 'The' from the beginning of every item. However, it doesn't exactly seem to be the simplest way to do it, since it calls from the SQLite database and then sorts. I'd prefer to cut it down to just one step, hopefully sorting straight from the database in my "SELECT" query.
I've done some searching on this, and have found some suggestions, including creating an extra sort-by column in the database. While this is certainly a possibility, I'm wondering if there's any simpler options that don't require inserting almost identical duplicate information (especially if the database becomes larger). I'm pretty new to SQLite, but I've read something about creating a collate function that can be used to create custom ordering. However, I'm not sure if this is appropriate use for it and can't seem to find any help with implementing it in C#.
Was hoping someone might be able to share some guidance. If an extra sorting column is the best way to go, then that is what I shall do.
To avoid inserting duplicate data, what about having two columns: TITLE_PREFIX (usually empty, but sometimes contains "The ", or "A "; no index on this column) and TITLE (contains the title without "The " or "A "; this is the column you create the index on). To display the data, you have to combine TITLE_PREFIX and TITLE. But you just search on TITLE.
Here is the solution:
ORDER BY (CASE
WHEN sortTitle LIKE 'the %' THEN substr(sortTitle,5)
WHEN sortTitle LIKE 'a %' THEN substr(sortTitle,3)
WHEN sortTitle LIKE 'an %' THEN substr(sortTitle,4)
ELSE sortTitle END)
You could store each title in 2 parts: title and prefix.
With SQLite you can combine 2 string values via the || operator also known as the concatenate operator.
Here's an example:
SELECT prefix || ' ' || title FROM movies ORDER BY title
You can also use ltrim in case prefix is empty, so you don't have a space at the front:
SELECT ltrim(prefix || ' ' || title) FROM movies ORDER BY title
Another alternative is to store the prefix at the end of the title. For example at a lot of movie stores you will see something like:
Three Musketeers, The
Within C# Code
If you wanted to do this within C#, use LINQ to do the ordering for you. I've posted a full sample on PasteBin. This will allow you to:
avoid duplicating data in your database
take advantage of DB indexes as you normally would, no matter which RDBMS
put in noise words in a config file, thereby reducing downtime/rebuild/redeploy when modifying the list
ensure a solution is more readable in your client code
DropDownList1.DataSource = myBooks.OrderBy(n => ReplaceNoise(n.Title))
public string ReplaceNoise(string input)
{
string[] noise = new string[] { "the", "an", "a" };
//surely this could be LINQ'd
foreach (string n in noise)
{
if (input.ToLower().StartsWith(n))
{
return input.Substring(n.Length).Trim();
}
}
return input;
}
Within your SQLite statement
How about simply replacing the noise words with blanks in the order by? It's an ugly first step, but strongly consider a new column to store this value for sorting purposes.
ORDER BY REPLACE(REPLACE([title],'the',''), 'a', '')
Admittedly, this gets ugly when you end up with this:
REPLACE(REPLACE(REPLACE(REPLACE([title],'The ',''),'a',''),'of',''),'by','')
You could try building a table that supports full-text searching (using the FTS module) on the title. Then you'll be able to do fast searches on any words in the title without requiring lots of extra work on your part. For example, a user query of good bad ugly might produce “The Good, the Bad and the Ugly” as one of its first results. The extra cost of all this is about a quarter of the length of the text itself in general, but might be more for your dataset, as titles aren't full english text. You also need to spend the time building those extra indices – you don't want to build them on your main dataset on a live system (obviously) – but that shouldn't be too big a problem.
Create a virtual column (result of a function that can be implemented in C#) and sort on this virtual column. The function could move "The" to the end as in "Three Musketeers, The" or discard "The", whatever you want it to do.

How to fetch entries starting with the given string from a SQL Server database?

I have a database with a lot of words to be used in a tag system. I have created the necessary code for an autocomplete box, but I am not sure of how to fetch the matching entries from the database in the most efficient way.
I know of the LIKE command, but it seems to me that it is more of an EQUAL command. I get only the words that looks exactly like the word I enter.
My plan is to read every row, and then use C#'s string.StartsWith() and string.Contains() functions to find words that may fit, but I am thinking that with a large database, it may be inefficient to read every row and then filter them.
Is there a way to read only rows that starts with or contains a given string from SQL Server?
When using like, you provide a % sign as a wildcard. If you want strings that start with Hello, you would use LIKE 'Hello%' If you wanted strings with Hello anywhere in the string, you would use LIKE '%Hello%'
As for efficiency, using Like is not optimal. You should look into full text search.
I know of the LIKE command, but it seems to me that it is more of an EQUAL command. I get only the words that looks exactly like the word I enter.
That's because you aren't using wildcards:
WHERE column LIKE 'abc%'
...will return rows where the column value starts with "abc". I'll point out that when using wildcards, this is the only version that can make use of an index on the column... er column.
WHERE column LIKE '%abc%'
...will return rows where the column value contains "abc" anywhere in it. Wildcarding the left side of a LIKE guarantees that an index can not be used.
SQL Server doesn't natively support regular expressions - you have to use CLR functions to gain access to the functionality. But it performs on par with LIKE.
Full Text Search (FTS) is the best means of searching text.
You can also implement a StartWith functionality using the following statements:
LEFT('String in wich you search', X) = 'abc'
CHARINDEX('abc', 'String in wich you search') = 1
'String in wich you search' LIKE 'abc%'
Use the one wich performs best.
You can use CONTAINS in T-SQL, but I'm pretty sure you have to have to be using full-text indexing for the table involved in your query.
Contains
Getting started with Full-Text Search

Regex to Extract Update Columns from a Sql Statement

I need a Regex Statement (run in c#) that will take a string containing a Sql Update statement as input, and will return a list of columns to be updated. It should be able to handle columns surrounded by brackets or not.
// Example Sql Statement
Update Employees
Set FirstName = 'Jim', [LastName] = 'Smith', CodeNum = codes.Num
From Employees as em
Join CodeNumbers as codes on codes.EmployeeID = em.EmployeeID
In the end I would want to return an IEnumerable or List containing:
FirstName
LastName
CodeNum
Anyone have any good suggestions on implementation?
Update: The sql is user-generated, so I have to parse the Sql as it is given. The purpose of extracting the column names in my case is to validate that the user has permission to update the columns included in the query.
You're doing it backwards. Store the data in a broken out form, with the table to be updated, the column names, and the expressions to generate the new values all separate. From this canonical representation, generate both the SQL (when you need it) and the list of columns being updated (when you need that instead).
If you absolutely must pull the column names out of a SQL statement, I don't think that regular expressions are the correct way to go. For example, in the general case you may need to skip over new value expressions that contain arbitrarily nested parenthesis. You will probably want a full SQL parser. The book Lex & Yacc by Levine, Mason, and Brown has a chapter on parsing SQL.
Response to update:
You are in for a world of hurt. The only way to do what you want is to fully parse the SQL, because you also need to make sure that you don't have any subexpressions that perform unauthorized actions.
I very, very strongly recommend that you come up with another way to do whatever it is that you are doing. Maybe break out the modifiable fields into a separate table and use access controls? Maybe come up with another interface for them to use in specifying what they want done? Whatever it is that you're doing, there is almost certainly a better way to do it. Down that path there be dragons.
Regular expressions cannot do this task, because SQL is not a regular language.
You can do this, but not with a regular expression. You need a full-blown parser.
You can use ANTLR to generate parsers in C#, and there are free grammars available for parsing SQL in ANTLR.
However, I agree with Glomek that allowing user-supplied SQL to be run against your system, even after you have tried to validate that it includes no "unauthorized actions," is foolish. There are too many cases that may circumvent your validation.
Instead, if you have only a single text field, you should define a simplified Domain-Specific Language that permits users to specify only actions that they are authorized to do. From this input, you can build the SQL yourself.
SQL has a complex recursive grammer, and, there will always be some sub select, group by, or literal that will break your regex based parser.
Why don't use a sql parser to achieve what you need, here is an article shows you how to achieve what you need within 3 minutes.

Categories