Does Azure Search result guarantee order for * query? - c#

I'd like to manage AzSearch documents (indexed items) by AzSearch C# SDK.
What I try to do is to list up documents by query result (mostly * result) continuously and edit values of them.
To list up query result is as below;
public async Task<IEnumerable<MyIndexModel>> GetListAsync(string query, bool isNext = false)
{
if (string.IsNullOrEmpty(query)) query = "*";
DocumentSearchResult list;
if (!isNext)
{
list = await _indexClient.Documents.SearchAsync(query);
}
else
{
list = await _indexClient.Documents.ContinueSearchAsync(ContinuationToken);
}
ContinuationToken = list.ContinuationToken;
return list.Results.Select(o => o.Document.ToIndexModel());
}
One requirement is to jump to the n-th list of items. Since AzSearch does not provide paging, I'd like to know whether it gives ordered list or not.
If we do not update document count (not index further), does AzSearch give unchanged/ordered list so I can get the same document for jump to 80th list by running ContinueSearchAsync() method 80 times?
Do I have to maintain another lookup table for my requirement?

* is a wildcard query. Documents matching a wildcard query are given the same constant score in ranking because there's no way to measure how close a document is to . Further, order between the same score document is not guaranteed. A document matching '' can be ranked 1st in one response and 7th position in another even when the same query were issued.
In order to get consistent ordering for wildcard queries, I suggest passing in an $orderBy clause, search=*&$orderBy=id asc for example. Azure Search does support paging via $skip and $top. This document provides the guidance.

Related

Grouping on Lucene.net

`
public static async Task<List<string>> SearchGroup(string filedName, Query bq, Filter fil, IndexSearcher searcher)
{
//分组的最大数量
int groupNum = 100;
return await Task.Run(() =>
{
GroupingSearch groupingSearch = new GroupingSearch(filedName);
groupingSearch.SetCachingInMB(8192, cacheScores: true);
//不做排序,浪费性能
//groupingSearch.SetGroupSort(new Sort(new SortField("name", SortFieldType.STRING)));
groupingSearch.SetGroupDocsLimit(groupNum);
ITopGroups<BytesRef> topGroups = groupingSearch.SearchByField(searcher, fil, bq, groupOffset: 0, groupNum);
List<string> groups = new List<string>();
foreach (var groupDocs in topGroups.Groups.Take(groupNum))
{
if (groupDocs.GroupValue != null)
{
groups.Add(groupDocs.GroupValue.Utf8ToString());
}
}
return groups;
});
}
`
Here is my current code for grouping, but there are performance issues. The time for each call is equal to the time for one query. If I group multiple fields at the same time, it is very time-consuming. Is there any way to improve the speed?
There will be multiple screening items, but it is too time-consuming
Hope to have fast grouping results, or grouping at the same time
You have to add cache - group values usual not changing very frequently.
So it is better to pre-cache data before execute search request.
And then use such information every next query of same information.
To develop this look like you have to customzie lucene engine (searcher).
in my case i am using faceted filters instead of groups. Difference that groups shows sub content as tree view (above group), filters - all data shows as one set but can be easelly and quickly filters as on screenshot
Filters is not answer for all questions/problems. There is no information about task which you are going to sove with grouping - so i can't answer more exactly.
So cache as much as possible - is answer for 99% of such issues. Probably change strtategy to use facet filters - another way to improve issue/situation.

Find TOP (1) for each ID in array

I have a large (60m+) document collection, whereby each ID has many records in time series. Each record has an IMEI identifier, and I'm looking to select the most recent record for each IMEI in a given List<Imei>.
The brute force method is what is currently happening, whereby I create a loop for each IMEI and yield out the top most record, then return a complete collection after the loop completes. As such:
List<BsonDocument> documents = new List<BsonDocument>();
foreach(var config in imeiConfigs)
{
var filter = GetImeiFilter(config.IMEI);
var sort = GetImeiSort();
var data = _historyCollection.Find(filter).Sort(sort).Limit(1).FirstOrDefault();
documents.Add(data);
}
The end result is a List<BsonDocument> which contains the most recent BsonDocument for each IMEI, but it's not massively performant. If imeiConfigs is too large, the query takes a long time to run and return as the documents are rather large.
Is there a way to select the TOP 1 for each IMEI in a single query, as opposed to brute forcing like I am above?
have tried using the LINQ Take function?
List documents = new List();
foreach(var config in imeiConfigs)
{
var filter = GetImeiFilter(config.IMEI);
var sort = GetImeiSort();
var data = _historyCollection.Find(filter).Sort(sort).Take(1).FirstOrDefault();
documents.Add(data);
}
https://learn.microsoft.com/es-es/dotnet/api/system.linq.enumerable.take?view=netframework-4.8
I think bad performance come from "Sort(sort)", because the sorting forces it to go through all the collection.
But perhaps you can improuve time performance with parallel.
List<BsonDocument> documents;
documents = imeiConfigs.AsParallel().Select((config) =>
{
var filter = GetImeiFilter(config.IMEI);
var sort = GetImeiSort();
var data = _historyCollection.Find(filter).Sort(sort).Limit(1).FirstOrDefault();
return data;
}).ToList();

How to retrieve records more than 4000 from Raven DB in SIngle Session [duplicate]

I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}
It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.
The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default
The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging
Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();
You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.

How to retrieve more than 100 Records from Amazon Simple DB

I am Creating an application in MVC and using Amazon cloud service for backend. I need the Data in Bulk from Database like so I am using the Query like this-
SelectResponse response = simpleDBClient.Select(new SelectRequest()
{
SelectExpression = "select * from survey1 limit 2400"
});
which is now working fine and returning 2400 Records. Now I want to Apply Search on these Records So I must have to use where clause but when I am using where clause then its returning only 10 Records in any Valid Condition.
Please Help me Any Help will be Appreciated.
You can use limit with the where clause. For more detail here is the Syntax for the Select Query-
select output_list from domain_name [where expression] [sort_instructions] [limit limit]
The output_list can be: , itemName(),count(), list of attributes
* for all attributes.
`itemName()` for the item name only.
count(*) for the total count of items matches the query expression. It will return the number of items in a result set instead of returning the items.
An explicit list of attributes (attribute1,..., attributeN)
The domain_name is the domain from which you want to search items.
The expression is the match expression for items. You can use select expressions like =, <=, <, > =, like, not like, between, is null, is not null etc.
The sort_instructions sorts the results on a single attribute, in an ascending or descending order.
The limit is the maximum number of results to return (default: 100, max. 2500).
Please Note-
The total size of the response cannot exceed 1 MB. Amazon SimpleDB
automatically adjusts the number of items returned per page to enforce
this limit. For example, even if you ask to retrieve 2500 items, but
each individual item is 10 KB in size, the system returns 100 items
and an appropriate next token so you can get the next page of results.
Note: Operations that run longer than 5 seconds return a time-out
error response or a partial or empty result set. Partial and empty
result sets contain a NextToken value, which allows you to continue
the operation from where it left off.
Source
SimpleDB can only retrieve only 1MB of data at a time, SimpleDB puts a nextToken in response to indicate that there is more data to be retrieved.
Here is how do it in Java.
public List<Item> getItems(){
AmazonSimpleDBClient client = new AmazonSimpleDBClient(...);
List<Item> items = new ArrayList<>();
String nextToken = null;
do {
final SelectRequest request = new SelectRequest();
request.setSelectExpression("MY SELECT QUERY");
// SimpleDB can paginate the result. For paginated result NextToken value will be not null
request.setNextToken(nextToken);
nextToken = null;
SelectResult result = client.select(request);
if (result != null) {
nextToken = (result.getNextToken() != null) ? result.getNextToken() : null;
items.addAll(result.getItems());
}
} while (nextToken != null);
return items;
}

How can I check if a string in sql server contains at least one of the strings in a local list using linq-to-sql?

In my database field I have a Positions field, which contains a space separated list of position codes. I need to add criteria to my query that checks if any of the locally specified position codes match at least one of the position codes in the field.
For example, I have a local list that contains "RB" and "LB". I want a record that has a Positions value of OL LB to be found, as well as records with a position value of RB OT but not records with a position value of OT OL.
With AND clauses I can do this easily via
foreach (var str in localPositionList)
query = query.Where(x => x.Position.Contains(str);
However, I need this to be chained together as or clauses. If I wasn't dealing with Linq-to-sql (all normal collections) I could do this with
query = query.Where(x => x.Positions.Split(' ').Any(y => localPositionList.contains(y)));
However, this does not work with Linq-to-sql as an exception occurs due it not being able to translate split into SQL.
Is there any way to accomplish this?
I am trying to resist splitting this data out of this table and into other tables, as the sole purpose of this table is to give an optimized "cache" of data that requires the minimum amount of tables in order to get search results (eventually we will be moving this part to Solr, but that's not feasible at the moment due to the schedule).
I was able to get a test version working by using separate queries and running a Union on the result. My code is rough, since I was just hacking, but here it is...
List<string> db = new List<string>() {
"RB OL",
"OT LB",
"OT OL"
};
List<string> tests = new List<string> {
"RB", "LB", "OT"
};
IEnumerable<string> result = db.Where(d => d.Contains("RB"));
for (int i = 1; i < tests.Count(); i++) {
string val = tests[i];
result = result.Union(db.Where(d => d.Contains(val)));
}
result.ToList().ForEach(r => Console.WriteLine(r));
Console.ReadLine();

Categories