How to query part of a field in Lucene.NET - c#

Say I have an index containing a collection of Users, storing their full name in a Name field. Some of these users are of the format "Firstname Lastname", and some are "Firstname Middlename(s) Lastname"
e.g.
Joe Bloggs
Joe Fred Bloggs
Joe John Paul Bloggs
If I search for "Joe Bloggs", I need it to return all users listed above.
I've tried using a PhraseQuery , but this will only return 'Joe Bloggs' (I presume due to terms needing to be in the correct order).
Is my only option to use a WildcardQuery? I wouldn't want 'Joe Smith' or 'John Bloggs' to be returned. Also, I can't rework the index to split the full name into separate fields.
How best should I form my query to get things to work as required?
Thanks

Take care which analyzer you use.
You probably just want "whitespace" to break words at ws. Plus "lower case" so that "fred" matches "Fred"
You query should simply be "name:joe AND name:bloggs" (or the equivalent if you are constructing your query objects manually)
This says that the name field MUST contain both words

Related

lucene.net filtering on multiple fields

Following is my schema
Product_Name (Analyzed),Category (Analyzed)
Scenario:
I want to search those products whose category is exactly "Cellphones & Accessories" and Product_Name is "sam*"
Equivalent SQL Query is
select * from products
where Product_Name like '%sam%' and Category='Cellphones & Accessories'
I am using lucene.net.
I need equivalent lucene.net statement.
As this is a few months old I'll be brief (I can expand if you're still interested)...
If you want to have an exact match to Category then do not analyze. Analyzers will chop the string up into bits which are then searchable. Matching case can be problematic so maybe just the lowercase analyzer would work for that field.
It might be useful to have several fields analyzed in different ways so that different queries can be used.
NOTE: "sam*" is not equivalent to "%sam%"
Do you want "sam" to be a prefix ie "sample" or a word "the sam product"?
If it's a word then a no stopword analyzer should be fine.
A nice trick is to create many fields (with the same name) with variations of the name. Probably with just a lower case analyzer
name: "some sample product"
name: "sample product"
name" "product"
Then have a look at "prefix queries". a query of (name:sam) would then match.
Also have a look at the PerFieldAnalyzerWrapper in order to use a different analyzer for each field.

remove strings from linq query

I have a linq query that returns Names like:
Joe
Joe (1)
Joe Bloggs
Joe (2)
Joe Bloggs (1)
What is the best way to remove the Joe Bloggs entries from the list so my result set would be - the format for name been duplicates will always be (n) where n is the incremented number for that name:
Joe
Joe (1)
Joe (2)
I was using the below to attempt to remove the elements I needed
newResults = results.Distinct().ToList();
However this it appears would only remove Duplicate entries. I was hoping to use a regex expression in linq query to find any entries that are Joe and any entries that are Joe ( but not sure if regex is the correct way to implement.
I have also plugged in a linq query as below:
var searchName = "Joe";
var results = names.Where(name => name.Equals(searchName)).ToList();
again this only returns 1 entry - can you include a regex in the .Equals to say find any names where the name equals the search name or equals the search name + space + (
You can you Regex.IsMatch method to get only these strings starting with Joe followed by an optional (1), (2), etc. string:
var pattern = #"Joe(?=(\W?\(\d+\))|$)";
var results = names.Where(name => Regex.IsMatch(name, pattern)).ToList();
Based on the sample input you provided results will be {"Joe", "Joe (1)", "Joe (2)"}.
You need to refine your query to return only the data that you want. If this is not possible, then you must apply a further query to the collection once obtained. Barring this, try using a loop structure to remove unwanted entries by comparing each item in the collection and then responding to the matches with desired code.
If i understand you correctly, you want to exclude records which contains "Bloggs" word, so...
List<string> joes = new List<string>{"Joe", "Joe (1)", "Joe Bloggs", "Joe (2)", "Joe Bloggs (1)"};
var qry = joes.Where(j=>!j.Contains("Bloggs"));
result:
Joe
Joe (1)
Joe (2)

Search a collection in c#

I'm trying to implament a search engine and wonder what is the best way to perform a search on collection of entities, while entity is an object of data, and the search criteria is changing from time to time: in the number of fields to search by, and in the which fields to search by. for example:
given a collection of itemEntity, (itemEntity is an object contains id, name, gender, age...ect.) I would like to be flexible with search: you can search by name + gender , or you can search by id only and so on.
How to do it?
p.s.
I'm writing in c#
Scott Gu blogged about the dynamic linq expressions, you can find something useful there.
NEVERMIND
THANKS FOR TRYING TO HELP
DID IT BY MY SELF...
MOVE ON ALL SEARCH CRITERIA,(RECIEVE IT IN A DICTIONARY), THEN BY SWITCH CASE - DOING A LINQ SELECT QUERY -> GOT THE RESULTS.

MongoDB use index in regular expression query

I am using the official C# MongoDB driver.
If I have an index on three elements {"firstname":1,"surname":1,"companyname":1} can I search the collection by using a regular expression that directly matches against the index value?
So, if someone enters "sun bat" as a search term, I would create a regex as follows
(?=.\bsun)(?=.\bbat).* and this should match any index entries where firstname or surname or companyname starts with 'sun' AND where firstname or surname or companyname starts with 'bat'.
If I can't do it this way, how can I do it? The user just types their search terms, so I won't know which element (firstname, surname, companyname) each search term (sun or bat) refers to.
Update: for MongoDB 2.4 and above you should not use this method but use MongoDB's text index instead.
Below is the original and still relevant answer for MongoDB < 2.4.
Great question. Keep this in mind:
MongoDB can only use one index per query.
Queries that use regular expressions only use an index when the regex is rooted and case sensitive.
The best way to do a search across multiple fields is to create an array of search terms (lower case) for each document and index that field. This takes advantage of the multi-keys feature of MongoDB.
So the document might look like:
{
"firstname": "Tyler",
"surname": "Brock",
"companyname": "Awesome, Inc.",
"search_terms": [ "tyler", "brock", "awesome inc"]
}
You would create an index: db.users.ensureIndex({ "search_terms": 1 })
Then when someone searches for "Tyler", you smash the case and search the collection using a case sensitive regex that matches the beginning of the string:
db.users.find({ "search_terms": /^tyler/ })
What mongodb does when executing this query is to try and match your term to every element of the array (the index is setup that way too -- so it's speedy). Hopefully that will get you where you need to be, good luck.
Note: These examples are in the shell. I have never written a single line of C# but the concepts will translate even though the syntax may differ.

group by first letter of the string

I have a Person table, with huge number of records, and want to group by duplicate persons in it,
by one of requirements persons are duplicates, if they have the same family name, and the first letter of first names are equal, so I want to group by first name, and first letter of family name, is there a way to group in sql like this? I need this in C#, so some code processing could be done, but the number of persons is huge, so it should be fast algorithm.
If I understand you correctly, from SqlServer you can do something like
SELECT DISTINCT
Surname,
LEFT(FirstName,1) FirstNameLetter
FROM Persons
Other than that, we will need a little bit more detail. Table schema, expected result set, etc...
SELECT MEMBER.MEMBER_FIRSTNAME, COUNT(MEMBER.MEMBER_LASTNAME)
FROM dbo.MEMBER
GROUP BY MEMBER.MEMBER_FIRSTNAME, SUBSTRING(MEMBER.MEMBER_LASTNAME, 1,1)
HAVING COUNT(MEMBER.MEMBER_LASTNAME) > 1
This query will give you all (members in this case) where the first name is the same and the last name's first letter is the same for more than one member. In other words duplicates as you've defined it.

Categories