Field Boosting Doesn't Work/Effect Lucene.net - c#

I'm trying to set boosting on documents fields to make the search results more accurate but as i see it doesn't work
however
here is my code
Indexing:
private static void _addToLuceneIndex(Datafile Datafile, IndexWriter writer)
{
// remove older index entry
var searchQuery = new TermQuery(new Term("Id", Datafile.article.Id.ToString()));
writer.DeleteDocuments(searchQuery);
// add new index entry
var doc = new Document();
var id = new Field("Id", Datafile.article.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED);
var content = new Field("Content", Datafile.article.Content, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS);
content.Boost = 4;
var title = new Field("Title", Datafile.article.Title, Field.Store.YES, Field.Index.ANALYZED);
title.Boost = 6;
doc.Add(id);
doc.Add(content);
doc.Add(title);
foreach (var item in Datafile.article.Article_Tag)
{
var tmpta = new Field("Atid", item.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED);
var tagname = new Field("Tagname", item.Tag.name, Field.Store.YES, Field.Index.ANALYZED);
tagname.Boost = 8;
doc.Add(tmpta);
doc.Add(tagname);
}
// add lucene fields mapped to db fields
// add entry to index
writer.AddDocument(doc);
}
i've used Lukenet to see if the fields boosted however it doesn't and boosting still equal to 1.0
so i tried to run and test it ,but the result disappoint me anyway
here is my search code:
searching:
private static IEnumerable<Datafile> _search(string searchQuery, string searchField = "")
{
// validation
if (string.IsNullOrEmpty(searchQuery.Replace("*", "").Replace("?", "")))
return new List<Datafile>();
var indexReader = IndexReader.Open(Directory, false);
// set up lucene searcher
using (var searcher = new IndexSearcher(indexReader))
{
var hits_limit = 1000;
// search by single field
var enanalyzer = new SnowballAnalyzer(Version.LUCENE_30, "English");
var aranalyzer = new SnowballAnalyzer(Version.LUCENE_30, "Arabic");
string[] fields = new string[] { "Title", "Content", "Tagname" };
// Dictionary<string, float> boosts = new Dictionary<string, float>();
// boosts.Add("Title", 5);
// boosts.Add("Content", 3);
// boosts.Add("Tagname", 7);
var enparser = new MultiFieldQueryParser(Version.LUCENE_30, fields, enanalyzer);
var arparser = new MultiFieldQueryParser(Version.LUCENE_30, fields, aranalyzer);
var query = QueryModel(searchQuery, new QueryParser[] { enparser, arparser });
searcher.SetDefaultFieldSortScoring(true, false);
TopFieldCollector collector = TopFieldCollector.Create(new Sort(new SortField(null, SortField.SCORE, false), new SortField("Title", SortField.STRING, true), new SortField("Tagname", SortField.STRING, true), new SortField("Content", SortField.STRING, true)),
hits_limit,
false, // fillFields - not needed, we want score and doc only
true, // trackDocScores - need doc and score fields
true, // trackMaxScore - related to trackDocScores
false); // should docs be in docId order?
searcher.Search(query, collector);
var hits = collector.TopDocs().ScoreDocs;
var results = new List<Datafile>();
foreach (var hit in hits)
{
var doc = searcher.Doc(hit.Doc);
var df = _mapLuceneDocumentToData(doc);
df.score = hit.Score;
results.Add(df);
}
searcher.Dispose();
return results;
// search by multiple fields (ordered by RELEVANCE)
}
}
QueryModel Method:
private static Query QueryModel(string searchQuery, QueryParser[] parsers)
{
BooleanQuery query = new BooleanQuery();
searchQuery = "*" + searchQuery + "*";
foreach (var parser in parsers)
{
parser.AllowLeadingWildcard = true;
var thequery = parser.Parse(searchQuery);
query.Add(new BooleanClause(thequery, Occur.SHOULD));
}
return query;
}
i'm new with lucene.net i love it but i can't get my head around this problem
PS:
also i want to get a fuzzy query as like when the user enter :
city in russua to get a result as if he enter: city in russia
i tried FuzzyQuery Class But it doesn't work anyway ,and is it necessary to use FuzzyQuery Class or not to get that result

So Since no one answer my question and i have found a solution for this issue i've used a search time query boosting and here is my code:
var QParser = new QueryParser(Version.LUCENE_30, "Content", analyzer);
QParser.AllowLeadingWildcard = true;
var Query = new QParser.Parse(searchQuery);
Query.Boost = 7.0f;
return Query;
you can use BooleanQuery if you want to Do an Or,And search

Related

IList<T> return as a generic

I'm a beginner for coding ,and I was trying to create a search engine , but there s a part I dont know how to solve it that returns an IList as a generic,
public IList<T> Search<T>(string textSearch)
{
IList<T> list = new List<T>();
var result = new DataTable();
using (Analyzer analyzer = new PanGuAnalyzer())
{
var queryParser = new QueryParser(Version.LUCENE_30, "FullText", analyzer);
queryParser.AllowLeadingWildcard = true;
var query = queryParser.Parse(textSearch);
var collector = TopScoreDocCollector.Create(1000, true);
Searcher.Search(query, collector);
var matches = collector.TopDocs().ScoreDocs;
result.Columns.Add("Title");
result.Columns.Add("Starring");
result.Columns.Add("ID");
foreach (var item in matches)
{
var id = item.Doc;
var doc = Searcher.Doc(id);
var row = result.NewRow();
row["Title"] = doc.GetField("Title").StringValue;
row["Starring"] = doc.GetField("Starring").StringValue;
row["ID"] = doc.GetField("ID").StringValue;
result.Rows.Add(row);
}
}
return result;
}
but in this code , I couldn't return result ,it says Cannot Implicitly convert type 'Data.DataTable' to 'Generic.IList',An explicit conversion exists.so how can I solve this?
I guess you don't want to support generics since it doesn't make sense and is impossible. You have a class, for example Film, then return a List<Film>, you don't need the DataTable:
public IList<Film> SearchFilms(string textSearch)
{
IList<Film> list = new List<Film>();
using (Analyzer analyzer = new PanGuAnalyzer())
{
var queryParser = new QueryParser(Version.LUCENE_30, "FullText", analyzer);
queryParser.AllowLeadingWildcard = true;
var query = queryParser.Parse(textSearch);
var collector = TopScoreDocCollector.Create(1000, true);
Searcher.Search(query, collector);
var matches = collector.TopDocs().ScoreDocs;
foreach (var item in matches)
{
var film = new Film();
var id = item.Doc;
var doc = Searcher.Doc(id);
film.Title = doc.GetField("Title").StringValue;
film.Starring = doc.GetField("Starring").StringValue;
film.ID = doc.GetField("ID").StringValue;
list.Add(film);
}
}
return list;
}
Your return statement should be
result.AsEnumerable().ToList();
Don't forget to add namespace
using System.Linq;

how to process hits on lucene 3.03

List<SearchResults> Searchresults = new List<SearchResults>();
// Specify the location where the index files are stored
string indexFileLocation = #"D:\Lucene.Net\Data\Persons";
Lucene.Net.Store.Directory dir = FSDirectory.Open(indexFileLocation);
// specify the search fields, lucene search in multiple fields
string[] searchfields = new string[] { "FirstName", "LastName", "DesigName", "CatagoryName" };
IndexSearcher indexSearcher = new IndexSearcher(dir);
// Making a boolean query for searching and get the searched hits
Query som = QueryMaker(searchString, searchfields);
int n = 1000;
TopDocs hits = indexSearcher.Search(som,null,n);
for (int i = 0; i <hits.TotalHits; i++)
{
SearchResults result = new SearchResults();
result.FirstName = hits.ScoreDocs.GetValue(i).ToString();
result.FirstName = hits.Doc.GetField("FirstName").StringValue();
result.LastName = hits.Doc(i).GetField("LastName").StringValue();
result.DesigName = hits.Doc(i).GetField("DesigName").StringValue();
result.Addres = hits.Doc(i).GetField("Addres").StringValue();
result.CatagoryName = hits.Doc(i).GetField("CatagoryName").StringValue();
Searchresults.Add(result);
}
i have table fields first name last name .... how can i process hit to get the values from the search result
i have an error that says TopDocs does not contain defination for doc
Lean on the compiler. There is no property or method called Doc in TopDocs class. In ScoreDocs property of TopDocs class you have list of hits with document number and score. You need to use this document number to get actual document. After that use method Doc which is in IndexSearcher to query for document with this number. And then you can get stored field data from that document.
You can process results like that:
foreach (var scoreDoc in hits.ScoreDocs)
{
var result = new SearchResults();
var doc = indexSearcher.Doc(scoreDoc.Doc);
result.FirstName = doc.GetField("FirstName").StringValue;
result.LastName = doc.GetField("LastName").StringValue;
result.DesigName = doc.GetField("DesigName").StringValue;
result.Addres = doc.GetField("Addres").StringValue;
result.CategoryName = doc.GetField("CategoryName").StringValue;
Searchresults.Add(result);
}
Or in more LINQ way:
var searchResults =
indexSearcher
.Search(som, null, n)
.ScoreDocs
.Select(scoreDoc => indexSearcher.Doc(scoreDoc))
.Select(doc =>
{
var result = new SearchResults();
result.FirstName = doc.GetField("FirstName").StringValue;
result.LastName = doc.GetField("LastName").StringValue;
result.DesigName = doc.GetField("DesigName").StringValue;
result.Addres = doc.GetField("Addres").StringValue;
result.CategoryName = doc.GetField("CategoryName").StringValue;
return result;
})
.ToList();
Separation of hits method will let you clear the matched documents and in future if you want to highlight the matched documents then you can easily embed the lucene.net highlighter in getMatchedHits method.
List<SearchResults> Searchresults = new List<SearchResults>();
// Specify the location where the index files are stored
string indexFileLocation = #"D:\Lucene.Net\Data\Persons";
Lucene.Net.Store.Directory dir = FSDirectory.Open(indexFileLocation);
// specify the search fields, lucene search in multiple fields
string[] searchfields = new string[] { "FirstName", "LastName", "DesigName", "CatagoryName" };
IndexSearcher indexSearcher = new IndexSearcher(dir);
// Making a boolean query for searching and get the searched hits
Query som = QueryMaker(searchString, searchfields);
int n = 1000;
var hits = indexSearcher.Search(som,null,n).ScoreDocs;
Searchresults = getMatchedHits(hits,indexSearcher);
getMatchedHits method code:
public static List<SearchResults> getMatchedHits(ScoreDoc[] hits, IndexSearcher searcher)
{
List<SearchResults> list = new List<SearchResults>();
SearchResults obj;
try
{
for (int i = 0; i < hits.Count(); i++)
{
// get the document from index
Document doc = searcher.Doc(hits[i].Doc);
string strFirstName = doc.Get("FirstName");
string strLastName = doc.Get("LastName");
string strDesigName = doc.Get("DesigName");
string strAddres = doc.Get("Addres");
string strCategoryName = doc.Get("CategoryName");
obj = new SearchResults();
obj.FirstName = strFirstName;
obj.LastName = strLastName;
obj.DesigName= strDesigName;
obj.Addres = strAddres;
obj.CategoryName = strCategoryName;
list.Add(obj);
}
return list;
}
catch (Exception ex)
{
return null; // or throw exception
}
}
Hope it Helps!

lucene.net sort not working access violation

I am trying to sort my results in lucene
I keep getting this error however
An unhandled exception of type 'System.AccessViolationException' occurred in Search.dll
Additional information: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
I have tried setting Field.Index to analysed and not analysed but no joy.
Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "Title", analyzer);
Query query = parser.Parse(searchTerm.Trim() + "*");
var searcher = new IndexSearcher(directory, true);
var sortBy = new Lucene.Net.Search.Sort(new Lucene.Net.Search.SortField("Title", Lucene.Net.Search.SortField.STRING, true));
var filter = new QueryWrapperFilter(query);
// TopDocs topDocs3 = searcher.Search(query, filter, 500,sortBy);
// TopDocs topDocs = searcher.Search(query,500);
TopDocs topDocs2 = searcher.Search(query,null, 500, new Sort(new SortField("Title", SortField.STRING)));
var re = searcher.Search(query, null, 10, new Sort(new SortField("id", SortField.INT, true)));
I have encountered the same error when trying to order my search results in LUCENE_30. I must say I wrote this example in a hurry and is not tested.
What I did was the following:
string sortText = Enum.GetName(typeof(SortableFields), sortBy);
SortField field = new SortField(sortText, SortField.STRING, sortDesc);
var sortByField = new Lucene.Net.Search.Sort(field);
TopFieldCollector collector = Lucene.Net.Search.TopFieldCollector.Create(sortByField, MaxSearchResultsReturned, false, false, false, false);
using (Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30))
{
var queryParse = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, IndexFields.FullText, analyzer);
queryParse.AllowLeadingWildcard = true;
Query query = queryParse.Parse(searchText);
using (var searcher = new IndexSearcher(directory, true))
{
searcher.Search(query, collector);
totalRows = collector.TotalHits;
TopDocs matches = collector.TopDocs(skip, take);
// convert results to known objects
var results = new List<SearchResult>();
foreach (var item in matches.ScoreDocs)
{
int id = item.Doc;
Document doc = searcher.Doc(id);
SearchResult result = new SearchResult();
result.ID = doc.GetField("ID").StringValue;
results.Add(result);
}
}
}
return results;

Getting terms matched in a document when searching using a wildcard search

I am looking for a way to find the terms that matched in the document using waldcard search in Lucene. I used the explainer to try and find the terms but this failed. A portion of the relevant code is below.
ScoreDoc[] myHits = myTopDocs.scoreDocs;
int hitsCount = myHits.Length;
for (int myCounter = 0; myCounter < hitsCount; myCounter++)
{
Document doc = searcher.Doc(myHits[myCounter].doc);
Explanation explanation = searcher.Explain(myQuery, myCounter);
string myExplanation = explanation.ToString();
...
When I do a search on say micro*, documents are found and it enter the loop but myExplanation contains NON-MATCH and no other information.
How do I get the term that was found in this document ?
Any help would be most appreciated.
Regards
class TVM : TermVectorMapper
{
public List<string> FoundTerms = new List<string>();
HashSet<string> _termTexts = new HashSet<string>();
public TVM(Query q, IndexReader r) : base()
{
List<Term> allTerms = new List<Term>();
q.Rewrite(r).ExtractTerms(allTerms);
foreach (Term t in allTerms) _termTexts.Add(t.Text());
}
public override void SetExpectations(string field, int numTerms, bool storeOffsets, bool storePositions)
{
}
public override void Map(string term, int frequency, TermVectorOffsetInfo[] offsets, int[] positions)
{
if (_termTexts.Contains(term)) FoundTerms.Add(term);
}
}
void TermVectorMapperTest()
{
RAMDirectory dir = new RAMDirectory();
IndexWriter writer = new IndexWriter(dir, new Lucene.Net.Analysis.Standard.StandardAnalyzer(), true);
Document d = null;
d = new Document();
d.Add(new Field("text", "microscope aaa", Field.Store.YES, Field.Index.ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
writer.AddDocument(d);
d = new Document();
d.Add(new Field("text", "microsoft bbb", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
writer.AddDocument(d);
writer.Close();
IndexReader reader = IndexReader.Open(dir);
IndexSearcher searcher = new IndexSearcher(reader);
QueryParser queryParser = new QueryParser("text", new Lucene.Net.Analysis.Standard.StandardAnalyzer());
queryParser.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
Query query = queryParser.Parse("micro*");
TopDocs results = searcher.Search(query, 5);
System.Diagnostics.Debug.Assert(results.TotalHits == 2);
TVM tvm = new TVM(query, reader);
for (int i = 0; i < results.ScoreDocs.Length; i++)
{
Console.Write("DOCID:" + results.ScoreDocs[i].Doc + " > ");
reader.GetTermFreqVector(results.ScoreDocs[i].Doc, "text", tvm);
foreach (string term in tvm.FoundTerms) Console.Write(term + " ");
tvm.FoundTerms.Clear();
Console.WriteLine();
}
}
One way is to use the Highlighter; another way would be to mimic what the Highlighter does by rewriting your query by calling myQuery.rewrite() with an appropriate rewriter; this is probably closer in spirit to what you were trying. This will rewrite the query to a BooleanQuery containing all the matching Terms; you can get the words out of those pretty easily. Is that enough to get you going?
Here's the idea I had in mind; sorry about the confusion re: rewriting queries; it's not really relevant here.
TokenStream tokens = TokenSources.getAnyTokenStream(IndexReader reader, int docId, String field, Analyzer analyzer);
CharTermAttribute termAtt = tokens.addAttribute(CharTermAttribute.class);
while (tokens.incrementToken()) {
// do something with termAtt, which holds the matched term
}

Lucene returns same exact search results no matter the search term

Here is my code
term = Server.UrlDecode(term);
string indexFileLocation = "C:\\lucene\\Index\\post";
Lucene.Net.Store.Directory dir =
Lucene.Net.Store.FSDirectory.GetDirectory(indexFileLocation, false);
//create an index searcher that will perform the search
Lucene.Net.Search.IndexSearcher searcher = new
Lucene.Net.Search.IndexSearcher(dir);
//build a query object
Lucene.Net.Index.Term searchTerm =
new Lucene.Net.Index.Term("post_title", term);
Lucene.Net.Analysis.Standard.StandardAnalyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer();
Lucene.Net.QueryParsers.QueryParser queryParser = new
Lucene.Net.QueryParsers.QueryParser("post_title", analyzer);
Lucene.Net.Search.Query query = queryParser.Parse(term);
//execute the query
Lucene.Net.Search.Hits hits = searcher.Search(query);
List<string> s = new List<string>();
for (int i = 0; i < hits.Length(); i++)
{
Lucene.Net.Documents.Document doc = hits.Doc(i);
s.Add(doc.Get("post_title_raw"));
}
ViewData["s"] = s;
here is my indexing code
//create post lucene index
LuceneType lt = new LuceneType();
lt.Analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer();
lt.Writer = new Lucene.Net.Index.IndexWriter("C:/lucene/Index/post", lt.Analyzer, true);
using (var context = new MvcApplication1.Entity.test2Entities())
{
var posts = from p in context.post
where object.Equals(p.post_parentid, null) && p.post_isdeleted == false
let Answers = from a in context.post
where a.post_parentid == p.post_id
select new
{
a.post_description
}
let Comments = from c in context.comment
where c.post.post_id == p.post_id
select new
{
c.comment_text
}
select new
{
p,
Answers,
Comments
};
foreach (var post in posts)
{
//lets concate all the answers and comments
StringBuilder answersSB = new StringBuilder();
StringBuilder CommentsSB = new StringBuilder();
foreach (var answer in post.Answers)
answersSB.Append(answer.post_description);
foreach (var comment in post.Comments)
CommentsSB.Append(comment.comment_text);
//add rows
lt.Doc.Add(new Lucene.Net.Documents.Field(
"post_id",
post.p.post_id.ToString(),
Lucene.Net.Documents.Field.Store.YES,
Lucene.Net.Documents.Field.Index.UN_TOKENIZED
));
lt.Doc.Add(new Lucene.Net.Documents.Field(
"post_title",
new System.IO.StringReader(post.p.post_title)));
lt.Doc.Add(new Lucene.Net.Documents.Field(
"post_title_raw",
post.p.post_title,
Lucene.Net.Documents.Field.Store.YES,
Lucene.Net.Documents.Field.Index.UN_TOKENIZED));
lt.Doc.Add(new Lucene.Net.Documents.Field(
"post_titleslug",
post.p.post_titleslug,
Lucene.Net.Documents.Field.Store.YES,
Lucene.Net.Documents.Field.Index.UN_TOKENIZED));
lt.Doc.Add(new Lucene.Net.Documents.Field(
"post_tagtext",
new System.IO.StringReader(post.p.post_tagtext)));
lt.Doc.Add(new Lucene.Net.Documents.Field(
"post_tagtext",
post.p.post_tagtext,
Lucene.Net.Documents.Field.Store.YES,
Lucene.Net.Documents.Field.Index.UN_TOKENIZED));
lt.Doc.Add(new Lucene.Net.Documents.Field(
"post_description",
new System.IO.StringReader(post.p.post_description)));
lt.Doc.Add(new Lucene.Net.Documents.Field(
"post_description_raw",
post.p.post_description,
Lucene.Net.Documents.Field.Store.YES,
Lucene.Net.Documents.Field.Index.UN_TOKENIZED));
lt.Doc.Add(new Lucene.Net.Documents.Field(
"post_Answers",
new System.IO.StringReader(answersSB.ToString())));
lt.Doc.Add(new Lucene.Net.Documents.Field(
"post_Comments",
new System.IO.StringReader(CommentsSB.ToString())));
}
lt.Writer.AddDocument(lt.Doc);
lt.Writer.Optimize();
lt.Writer.Close();
why does this return the same reuslts for any search term?
Lucene.Net.Search.Query query = queryParser.Parse(term);
In the code above instead of searchterm you have used term
Your code must be like below
Lucene.Net.Search.Query query = queryParser.Parse(searchterm);
You can make some small alteration as like below
//build a query object
Lucene.Net.Index.Term searchTerm =
new Lucene.Net.Index.Term("post_title", term);
TermQuery tq = new TermQuery(searchTerm);
......
......
Lucene.Net.Search.Query query = tq;
Now there is no need of Parser.
IF still u need parser then you can change the above line as
Lucene.Net.Search.Query query = queryParser.Parse(tq.ToString());
Hope this helps.
Not a direct answer, but get LUKE (It works with .NET indexes too) and open your index -- Try to use it's querier using the right type of optimizer. If that works, you know the problem is in your querying. If it doesn't it could be in both the indexing and the querying, but at least this ought to get you on the right track.

Categories