Lucene boosting not working

Lucene boosting not working - c#

I'm indexing a document and setting the boost as follows:
document.SetBoost(5f);
because I want certain documents to appear before. For example, I want more recent news to show first.
When I do the search, like this:
var parser = new QueryParserEx(Version.LUCENE_29, "contents", analyzer);
parser.SetDefaultOperator(QueryParser.Operator.AND);
parser.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
Query query;
query = parser.Parse("text*");
The query gets parsed as a WildcardQuery and internally it's using this:
{Lucene.Net.Search.MultiTermQuery.AnonymousClassConstantScoreAutoRewrite}
Not sure why it's still using a Constant Score rewriter. Can someone explain?
I also believe I cannot use at search-time boosting as I don't need to boost certain terms, but certain documents (eg, most recent news appear first).
PS: This is not a duplicate of this question.

Nevermind. I was using a custom implementation of the QueryParser that had the method NewTermQuery overwritten.
Something like this:
protected override Query NewTermQuery(Term term)
{
var field = term.Field();
var text = term.Text() ?? "";
if (field == "contents" &&
text.Length >= 3 &&
text.IndexOfAny(new[] { '*', '?' }) < 0)
{
var wq = new WildcardQuery(new Term(field, text + "*"));
return wq;
}
return base.NewTermQuery(term);
}
And the WildcardQuery was not taking that configuration. All I had to do is call wq.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE); inside that if.

Related

How to set List child element with for each If it is empty initially?

I have Ilist to get all Offer from repository using entity framework core. Also I have service model OfferResponseModel which includes
OfferRequestModel as reference. I used mapster to bind entity model to service model. However it only set first child. Now I want to bind it manually. I created "offers" with the size of "Offer". When I try to use foreach loop, I cannot set "offers" child element.Because it has no elements. So, I can I solve this.
var offer = await _unitOfWork.Offers.GetAllOffer();
if (offer == null)
throw ServiceExceptions.OfferNotFound;
var results = new List<OfferResponseModel>(offer.Count);
results.ForEach(c => { c.Offer = new OfferRequestModel(); });
int i = 0;
foreach(var result in results)
{
result.Offer.User = Offer[i].User.Adapt<UserResponseModel>();
result.Offer.Responsible = Offer[i].Responsible.Adapt<EmployeeResponseModel>();
result.CreatedDate = Offer[i].CreatedDate;
result.ModifiedBy = Guid.Parse(Offer[i].UpdatedBy);
result.Active = Offer[i].Status;
result.Offer = Offer[i].Offer;
result.Offer.User.Company = Offer[i].Company.Adapt<CompanyModel>();
i++;
}

I created "offers" with the size of "Offer".
No, you created it with that capacity. It's still an empty list. It's not clear to me why you're trying to take this approach at all - it looks like you want one OfferResponseModel for each entry in offer, directly from that - which you can do with a single LINQ query. (I'm assuming that offer and Offer are equivalent here.)
var results = Offer.Select(o => new OfferResponseModel
{
Offer = new OfferRequestModel
{
User = o.User.Adapt<UserResponseModel>(),
Responsible = o.Responsible.Adapt<EmployeeResponseModel>()
},
CreatedDate = o.CreatedDate,
ModifiedBy = Guid.Parse(o.UpdatedBy),
Active = o.Status
}).ToList();
That doesn't set the Offer.User.Company in each entry, but your original code is odd as it sets the User and Responsible properties in the original Offer property, and then replaces the Offer with Offer[i].Offer. (Aside from anything else, I'd suggest trying to use the term "offer" less frequently - just changing the plural to "offers" would help.)
I suspect that with the approach I've outlined above, you'll be able to work out what you want and express it more clearly anyway. You definitely don't need to take the "multiple loops" approach of your original code.

One thing you have left out is the type of the offer variable that is referenced in the code. But I am thinking you need to do something along these lines:
if (offer == null)
throw ServiceExceptions.OfferNotFound;
var results = offer.Select(o => new OfferResponseModel
{
Offer = new OfferRequestModel
{
User = o.User.Adapt<UserResponseModel>(),
Responsible = o.Responsible.Adapt<EmployeeResponseModel>(),
...
}
}).ToList();
Select basically loops through any items in offer and "converts" them to other objects, in this case OfferResponseModel. So inside select you simply new up an OfferResponseModel and directly sets all the properties you need to set.
You need using System.Linq; for Select to be available.

How to do this kind of search in ASP.net MVC?

I have an ASP.NET MVC web application.
The SQL table has one column ProdNum and it contains data such as 4892-34-456-2311.
The user needs a form to search the database that includes this field.
The problem is that the user wants to have 4 separate fields in the UI razor view whereas each field should match with the 4 parts of data above between -.
For example ProdNum1, ProdNum2, ProdNum3 and ProdNum4 field should match with 4892, 34, 456, 2311.
Since the entire search form contains many fields including these 4 fields, the search logic is based on a predicate which is inherited from the PredicateBuilder class.
Something like this:
...other field to be filtered
if (!string.IsNullOrEmpty(ProdNum1) {
predicate = predicate.And(
t => t.ProdNum.toString().Split('-')[0].Contains(ProdNum1).ToList();
...other fields to be filtered
But the above code has run-time error:
The LINQ expression node type 'ArrayIndex' is not supported in LINQ to Entities`
Does anybody know how to resolve this issue?

Thanks a lot for all responses, finally, I found an easy way to resolve it.
instead of rebuilding models and change the database tables, I just add extra space in the search strings to match the search criteria. since the data format always is: 4892-34-456-2311, so I use Startwith(PODNum1) to search first field, and use Contains("-" + PODNum2 + "-") to search second and third strings (replace PODNum1 to PODNum3), and use EndWith("-" + PODNum4) to search 4th string. This way, I don't need to change anything else, it is simple.
Again, thanks a lot for all responses, much appreciated.

If i understand this correct,you have one column which u want to act like 4 different column ? This isn't worth it...For that,you need to Split each rows column data,create a class to handle the splitted data and finally use a `List .Thats a useless workaround.I rather suggest u to use 4 columns instead.
But if you still want to go with your existing applied method,you first need to Split as i mentioned earlier.For that,here's an example :
public void test()
{
SqlDataReader datareader = new SqlDataReader;
while (datareader.read)
{
string part1 = datareader(1).toString.Split("-")(0);///the 1st part of your column data
string part2 = datareader(1).toString.Split("-")(1);///the 2nd part of your column data
}
}
Now,as mentioned in the comments,you can rather a class to handle all the data.For example,let's call it mydata
public class mydata {
public string part1;
public string part2;
public string part3;
public string part4;
}
Now,within the While loop of the SqlDatareader,declare a new instance of this class and pass the values to it.An example :
public void test()
{
SqlDataReader datareader = new SqlDataReader;
while (datareader.read)
{
Mydata alldata = new Mydata;
alldata.Part1 = datareader(1).toString.Split("-")(0);
alldata.Part2 = datareader(1).toString.Split("-")(1);
}
}
Create a list of the class in class-level
public class MyForm
{
List<MyData> storedData = new List<MyData>;
}
Within the while loop of the SqlDatareader,add this at the end :
storedData.Add(allData);
So finally, u have a list of all the splitted data..So write your filtering logic easily :)

As already mentioned in a comment, the error means that accessing data via index (see [0]) is not supported when translating your expression to SQL. Split('-') is also not supported hence you have to resort to the supported functions Substring() and IndexOf(startIndex).
You could do something like the following to first transform the string into 4 number strings ...
.Select(t => new {
t.ProdNum,
FirstNumber = t.ProdNum.Substring(0, t.ProdNum.IndexOf("-")),
Remainder = t.ProdNum.Substring(t.ProdNum.IndexOf("-") + 1)
})
.Select(t => new {
t.ProdNum,
t.FirstNumber,
SecondNumber = t.Remainder.Substring(0, t.Remainder.IndexOf("-")),
Remainder = t.Remainder.Substring(t.Remainder.IndexOf("-") + 1)
})
.Select(t => new {
t.ProdNum,
t.FirstNumber,
t.SecondNumber,
ThirdNumber = t.Remainder.Substring(0, t.Remainder.IndexOf("-")),
FourthNumber = t.Remainder.Substring(t.Remainder.IndexOf("-") + 1)
})
... and then you could simply write something like
if (!string.IsNullOrEmpty(ProdNum3) {
predicate = predicate.And(
t => t.ThirdNumber.Contains(ProdNum3)

How to retrieve records more than 4000 from Raven DB in SIngle Session [duplicate]

I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}

It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.

The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default

The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging

Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();

You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.

ActiveDirectory with Range not changing results using DirectorySearcher

So I'm basically trying to enumerate results from AD, and for some reason I'm unable to pull down new results, meaning it keeps continuously pulling the first 1500 results even though I tell it I want an additional range.
Can someone point out where I'm making the mistake? The code never breaks out of the loop but more importantly it pulls users 1-1500 even when I say I want users 1500-3000.
uint rangeStep = 1500;
uint rangeLow = 0;
uint rangeHigh = rangeLow + (rangeStep - 1);
bool lastQuery = false;
bool quitLoop = false;
do
{
string attributeWithRange;
if (!lastQuery)
{
attributeWithRange = String.Format("member;Range={0}-{1}", rangeLow, rangeHigh);
}
else
{
attributeWithRange = String.Format("member;Range={0}-*", rangeLow);
}
DirectoryEntry dEntryhighlevel = new DirectoryEntry("LDAP://OU=C,OU=x,DC=h,DC=nt");
DirectorySearcher dSeacher = new DirectorySearcher(dEntryhighlevel,"(&(objectClass=user)(memberof=CN=Users,OU=t,OU=s,OU=x,DC=h,DC=nt))",new string[] {attributeWithRange});
dSeacher.PropertiesToLoad.Add("givenname");
dSeacher.PropertiesToLoad.Add("sn");
dSeacher.PropertiesToLoad.Add("samAccountName");
dSeacher.PropertiesToLoad.Add("mail");
dSeacher.PageSize = 1500;
SearchResultCollection resultCollection = resultCollection = dSeacher.FindAll();
dSeacher.Dispose();
foreach (SearchResult userResults in resultCollection)
{
string Last_Name = userResults.Properties["sn"][0].ToString();
string First_Name = userResults.Properties["givenname"][0].ToString();
string userName = userResults.Properties["samAccountName"][0].ToString();
string Email_Address = userResults.Properties["mail"][0].ToString();
OriginalList.Add(Last_Name + "|" + First_Name + "|" + userName + "|" + Email_Address);
}
if(resultCollection.Count == 1500)
{
lastQuery = true;
rangeLow = rangeHigh + 1;
rangeHigh = rangeLow + (rangeStep - 1);
}
else
{
quitLoop = true;
}
}
while (!quitLoop);

You're mixing up two concepts which is what is causing you trouble. This is a FAQ on the SO forums so I probably should blog on this to try and clear things up.
Let me first just explain the concepts, then correct the code once the concepts are out there.
Concept one is fetching large collections of objects. When you fetch a lot of objects, you need to ask for them in batches. This is typically called "paging" through the results. When you do this you'll get back a paging cookie and can pass back the paged control in subsequent searches to keep getting a "page worth" of results with each pass.
The second concept is fetching large numbers of values from a single attribute. The simple example of this is reading the member attribute from a group (ex: doing a base search for that group). This is called "ranged retrieval." In this search mode you are doing a base search against that object for the large attribute (like member) and asking for "ranges" of values with each passing search.
The code above confuses these concepts. You are doing member range logic like you are doing range retrieval but you are in fact doing a search that is constructed to return a large # of objects like a paged search. This is why you are getting the same results over and over.
To fix this you need to first pick an approach. :) I recommend range retrieval against the group object and asking for the large member attribute in ranges. This will get you all of the members in the group.
If you go down this path, you'll notice you can't ask for attributes for these values. The only vlaue you get is the list of members, and you can then do searches for them. IF you opt to stay with paged searches like you have above, then you end up switching to paged searches.
If you opt to stick with paged searches, then you'll need to:
Get rid of the Range logic, and all mentions of 1500
Set a page size of something like 1000
Instead of ranging, look up how to do paged searches (using the page search control) using your API
If you pick ranging, you'll switch from a memberOf search like this to a search of the form:
a) scope: base
b) filter: (objectclass=)
c) base DN: OU=C,OU=x,DC=h,DC=nt
d) Attributes: member;Range=0-
...then you will increment the 0 up as you fetch ranges of values (ie do this search over and over again for each subsequent range of values, changing only the 0 to subsequent integers)
Other nits you'll notice in my logic:
- I don't set page size...you're not doing a paged search, so it doesn't matter.
- I dont' ever hard code the value 1500 here. It doesn't matter. Ther eis no value in knowing or even computing this. The point is that you asked for 0-* (ie all), you got back 1500, so then you say 1500-, then 3000-, and so on. You don't need to knwo the range size, only what you have been given so far.
I hope this fully answers it...
Here is a code snip of doing a paged search, per my comment below (this is what you would need to do using the System.DirectoryServices.Protocols namespace classes, going down the logical path you started above (paged searches, not ranged retrieval)):
string searchFilter = "(&(objectClass=user)(memberof=CN=Users,OU=t,OU=s,OU=x,DC=h,DC=nt))";
string baseDN = "OU=C,OU=x,DC=h,DC=nt";
var scope = SearchScope.Subtree;
var attributeList = new string[] { "givenname", "sn", "samAccountName", "mail" };
PageResultRequestControl pageSearchControl = new PageResultRequestControl(1000);
do
{
SearchRequest sr = new SearchRequest(baseDN, searchFilter, scope, attributeList);
sr.Controls.Add(pageSearchControl);
var directoryResponse = ldapConnection.SendRequest(sr);
if (directoryResponse.ResultCode != ResultCode.Success)
{
// Handle error
}
var searchResponse = (SearchResponse)directoryResponse;
pageSearchControl = null; // Reset!
foreach (var control in searchResponse.Controls)
{
if (control is PageResultResponseControl)
{
var prrc = (PageResultResponseControl)control;
if (prrc.Cookie.Length > 0)
{
pageSearchControl = new PageResultRequestControl(prrc.Cookie);
}
}
}
foreach (var entry in searchResponse.Entries)
{
// Handle the search result entry
}
} while (pageSearchControl != null);

Your problem is caused by creating new object of directory searcher in loop. Each time there will be new object that will take first 1500 records. Create instance of searher out of the loop and use same instance for all queries.

ASP.NET MVC 3 - Search with multiple terms

I have a method where by I'm able to search in a database for a specific customer(s). At the moment it only takes 1 term, but i'd like to be able to search with multiple terms (for example the customer's account number and their name). Below is my method:
public List<AXCustomer> allCustomers(string id)
{
string[] searchstring = id.Split(' ');
List<AXCustomer> customer = new List<AXCustomer>();
// if 3 terms are entered
if (searchstring.Length > 2)
{
}
// if 2 terms are entered
else if (searchstring.Length > 1)
{
}
// revert back to default search
else
{
customer = context.AXCustomers.Where(x => x.ACCOUNTNUM.Contains(id) ||
x.NAME.Contains(id) || x.ZIPCODE.Contains(id)).ToList();
}
return customer;
}
As you can see, i've decided to split each term entered (I assume each term will be seperated by a space) but I'm not sure how my LINQ query should be for terms longer than one. Any help would be appreciated

Since you don't know what will be entered or how long it will be, I would suggest doing the following:
public List<AXCustomer> allCustomers(string id)
{
string[] searchstring = id.Split(' ');
List<List<AXCustomer>> customerlists = new List<List<AXCustomer>>();
foreach (string word in searchstring)
{
customerlists.Add(context.AXCustomers.Where(x => x.ACCOUNTNUM.Contains(word) || x.NAME.Contains(word) || x.ZIPCODE.Contains(word)).ToList());
}
//Then you just need to see if you want ANY matches or COMPLETE matches.
//Throw your results together in a List<AXCustomer> and return it.
return mycombinedlist;
}
Any matches = throw all lists together, then take the distinct ones.
Complete matches = you'll have to check for items which occur in all customerlists.

It will work fine. I am using a similar type of query in my project and it seems to work great. Following is the code snippet
PagedList.IPagedList<Product> PagedProducts = dbStore.Products.Where(p => p.Name.Contains(query) || p.MetaKeywords.Contains(query)).ToList().ToPagedList(pageIndex, PageSize);
BTW, its running on a live server too.

You can dynamically attach as many conditions as you wish, in the following manner:
customer = context.AXCustomers.Where(x => x.ACCOUNTNUM.Contains(id));
customer = customer.Where(Condition 2);
customer = customer.Where(Condition 3);
And so on. You have full control over the criteria : just make sure that it resolves to a sequel query.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Lucene boosting not working - c#

Related

How to set List child element with for each If it is empty initially?

How to do this kind of search in ASP.net MVC?

How to retrieve records more than 4000 from Raven DB in SIngle Session [duplicate]

ActiveDirectory with Range not changing results using DirectorySearcher

ASP.NET MVC 3 - Search with multiple terms

Categories

Resources