lucene.net filtering on multiple fields

lucene.net filtering on multiple fields - c#

Following is my schema
Product_Name (Analyzed),Category (Analyzed)
Scenario:
I want to search those products whose category is exactly "Cellphones & Accessories" and Product_Name is "sam*"
Equivalent SQL Query is
select * from products
where Product_Name like '%sam%' and Category='Cellphones & Accessories'
I am using lucene.net.
I need equivalent lucene.net statement.

As this is a few months old I'll be brief (I can expand if you're still interested)...
If you want to have an exact match to Category then do not analyze. Analyzers will chop the string up into bits which are then searchable. Matching case can be problematic so maybe just the lowercase analyzer would work for that field.
It might be useful to have several fields analyzed in different ways so that different queries can be used.
NOTE: "sam*" is not equivalent to "%sam%"
Do you want "sam" to be a prefix ie "sample" or a word "the sam product"?
If it's a word then a no stopword analyzer should be fine.
A nice trick is to create many fields (with the same name) with variations of the name. Probably with just a lower case analyzer
name: "some sample product"
name: "sample product"
name" "product"
Then have a look at "prefix queries". a query of (name:sam) would then match.
Also have a look at the PerFieldAnalyzerWrapper in order to use a different analyzer for each field.

Related

Elasticsearch searching exact value of field with case insensitive

How can I search in elasticsearch field by its exact value but with case insensitive query?
For example I have field with value { "Type": "Płatność kartą" },
and my query will search by value "płatność kartą". I need to be able to search by list of string parameters (i.e. "płatność kartą", "płatność gotówką", etc.). I tried elastic TERMS query but it didn't return value when sensitive case difference appears. Field index is set to not_analyzed.

If you choose not analyzed when indexing, Elastic is not analyzing these terms at index time and that means they are stored verbatim. So when you are querying, you get no results as the query terms don't match the stored fields.
In order to be able to query with lowercase and get the uppercase results, too, you need to use an analyzer on your mapping. Here are the available options from the docs.
If none the available analyzers fit you, you can define your custom one, by specifying the filters you want to be applied. For example, using just the lowercase filter, Elastic will index the RegisteredPaymentType field just lowercased. Then, while querying, the same analyzer will be applied to the query and you will get the expecting results.

How to query part of a field in Lucene.NET

Say I have an index containing a collection of Users, storing their full name in a Name field. Some of these users are of the format "Firstname Lastname", and some are "Firstname Middlename(s) Lastname"
e.g.
Joe Bloggs
Joe Fred Bloggs
Joe John Paul Bloggs
If I search for "Joe Bloggs", I need it to return all users listed above.
I've tried using a PhraseQuery , but this will only return 'Joe Bloggs' (I presume due to terms needing to be in the correct order).
Is my only option to use a WildcardQuery? I wouldn't want 'Joe Smith' or 'John Bloggs' to be returned. Also, I can't rework the index to split the full name into separate fields.
How best should I form my query to get things to work as required?
Thanks

Take care which analyzer you use.
You probably just want "whitespace" to break words at ws. Plus "lower case" so that "fred" matches "Fred"
You query should simply be "name:joe AND name:bloggs" (or the equivalent if you are constructing your query objects manually)
This says that the name field MUST contain both words

Remove certain parts of a statement

I've got an application which generates a number of SQL statements, whose selection fields all contain AS clauses, like so:
SELECT TOP 200 [customer_name] AS [Customer Name], [customer_age] AS [Customer Age], AVG([customer_age]) AS 'Average Customer Age' FROM [tb_customers] GROUP BY [customer_age]
My statements will always be in that format. My task is to parse them so that "TOP 200" is removed, as well as all the AS clauses except for aggregates. In other words, I would want to parse the statements and in that case it would end up like so:
SELECT [customer_name], [customer_age], AVG([customer_age]) AS 'Average Customer Age' FROM [tb_customers] GROUP BY [customer_age]
How would I go about doing this? Is it even possible, as it seems like a very complex parsing task since the amount of fields is never going to be the same. If it helps, I've got a variable which stores the amount of fields in it (not including aggregates).

You may use a regular expression, like replace all occurrences of
AS \[.*?\]
with empty text
or all occurrences of
AS \[.*?\],
with a comma ",".
The question mark "?" is important here as it turns off greedy matching.

MongoDB use index in regular expression query

I am using the official C# MongoDB driver.
If I have an index on three elements {"firstname":1,"surname":1,"companyname":1} can I search the collection by using a regular expression that directly matches against the index value?
So, if someone enters "sun bat" as a search term, I would create a regex as follows
(?=.\bsun)(?=.\bbat).* and this should match any index entries where firstname or surname or companyname starts with 'sun' AND where firstname or surname or companyname starts with 'bat'.
If I can't do it this way, how can I do it? The user just types their search terms, so I won't know which element (firstname, surname, companyname) each search term (sun or bat) refers to.

Update: for MongoDB 2.4 and above you should not use this method but use MongoDB's text index instead.
Below is the original and still relevant answer for MongoDB < 2.4.
Great question. Keep this in mind:
MongoDB can only use one index per query.
Queries that use regular expressions only use an index when the regex is rooted and case sensitive.
The best way to do a search across multiple fields is to create an array of search terms (lower case) for each document and index that field. This takes advantage of the multi-keys feature of MongoDB.
So the document might look like:
{
"firstname": "Tyler",
"surname": "Brock",
"companyname": "Awesome, Inc.",
"search_terms": [ "tyler", "brock", "awesome inc"]
}
You would create an index: db.users.ensureIndex({ "search_terms": 1 })
Then when someone searches for "Tyler", you smash the case and search the collection using a case sensitive regex that matches the beginning of the string:
db.users.find({ "search_terms": /^tyler/ })
What mongodb does when executing this query is to try and match your term to every element of the array (the index is setup that way too -- so it's speedy). Hopefully that will get you where you need to be, good luck.
Note: These examples are in the shell. I have never written a single line of C# but the concepts will translate even though the syntax may differ.

SQL user defined aggregate order of values preserved?

Im using the code from this MSDN page to create a user defined aggregate to concatenate strings with group by's in SQL server. One of my requirements is that the order of the concatenated values are the same as in the query. For example:
Value Group
1 1
2 1
3 2
4 2
Using query
SELECT
dbo.Concat(tbl.Value) As Concat,
tbl.Group
FROM
(SELECT TOP 1000
tblTest.*
FROM
tblTest
ORDER BY
tblTest.Value) As tbl
GROUP BY
tbl.Group
Would result in:
Concat Group
"1,2" 1
"3,4" 2
The result seems to always come out correct and as expected, but than I came across this page that states that the order is not guaranteed and that attribute SqlUserDefinedAggregateAttribute.IsInvariantToOrder is only reserved for future use.
So my question is: Is it correct to assume that the concatenated values in the string can end up in any order? If that is the case then why does the example code on the MSDN page use the IsInvariantToOrder attribute?

I suspect a big problem here is your statement "the same as in the query" - however, your query never defines (and cannot define) an order by the things being aggregated (you can of course order the groups, by having a ORDER BY after the GROUP BY). Beyond that, I can only say that it is based purely on a set (rather than an ordered sequence), and that technically the order is indeed undefined.

While the accepted answer is correct, I wanted to share a workaround that others may find useful. Warning: it involves not using a user-defined aggregate at all :)
The link below describes an elegant way to build a concatenated, delimited list using only a SELECT statement and a varchar variable. The upside (for this thread) is that you can specify the order in which the rows are processed. The downside is that you can't easily concatenate across many different subsets of rows without painful iteration.
Not perfect, but for my use case was a good workaround.
http://blog.sqlauthority.com/2008/06/04/sql-server-create-a-comma-delimited-list-using-select-clause-from-table-column/

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

lucene.net filtering on multiple fields - c#

Related

Elasticsearch searching exact value of field with case insensitive

How to query part of a field in Lucene.NET

Remove certain parts of a statement

MongoDB use index in regular expression query

SQL user defined aggregate order of values preserved?

Categories

Resources