I'm very new to ElasticSearch, and I'm trying to make an aggregation, but can't seem to get it right.
I have some data in an ElasticSearch index that looks like this:
{
"customerId": "example_customer",
"request": {
"referer": "https://example.org",
}
"#timestamp": "2020-09-29T14:14:00.000Z"
}
My mapping:
{
"mappings": {
"properties": {
"customerId": { "type": "keyword" },
"request": {
"properties": {
"referer": { "type": "keyword" }
}
}
}
}
}
And I'm trying to get the referers that appear the most frequently for a specific customer in a date range. I could make the filter for the customer like this:
var result = await _client.SearchAsync<InsightRecord>(s =>
s.Aggregations(
a => a
.Filter("customer", customer =>
customer.Filter(q => q.Term(ir => ir.CustomerId, customerId)))
.Terms("top_referer", ts => ts.Field("request.referer"))
)
);
return result.Aggregations.Terms("top_referer").Buckets
.Select(bucket => new TopReferer { Url = bucket.Key, Count = bucket.DocCount ?? 0})
Now I want to narrow this down to a specific time range. This is what I have so far:
var searchDescriptor = s.Aggregations(a =>
a.Filter("customer", customer =>
customer.Filter(q =>
q.Bool(b =>
b.Must(
f2 => f2.DateRange(date => date.GreaterThanOrEquals(from).LessThanOrEquals(to)),
f1 => f1.Term(ir => ir.CustomerId, customerId)
)
)
)
)
.Terms("top_referers", ts => ts.Field("request.referer"))
);
The problem is that the date filter doesn't get included in the query, it translates to this JSON:
{
"aggs": {
"customer": {
"filter": {
"bool": {
"filter": [{
"term": {
"customerId": {
"value": "example_customer"
}
}
}
]
}
}
},
"top_referers": {
"terms": {
"field": "request.referer"
}
}
}
}
I tried ordering them differently, but it didn't help. It's always the customer filter that will appear in the JSON, and the date range is skipped. I also saw some people using a query combined with an aggregation, but I feel like I should be able to do this using the aggregation alone. Is this possible? What am I doing wrong in my query that the range doesn't show up in the JSON?
The issue is that the date range query does not specify a field
f2 => f2.DateRange(date => date.GreaterThanOrEquals(from).LessThanOrEquals(to)),
Add a field to this query
f2 => f2.DateRange(date => date
.Field("#timestamp")
.GreaterThanOrEquals(from)
.LessThanOrEquals(to)
),
In addition, for the filter aggregation to apply to the terms aggregation, the terms aggregation needs to be a sub aggregation of the filter aggregation. So it would be something like
var customerId = "foo";
var from = "now-365d";
var to = "now";
var result = await _client.SearchAsync<InsightRecord>(s => s
.Aggregations(a => a
.Filter("customer", customer => customer
.Filter(q => q
.Bool(b => b
.Must(
f2 => f2.DateRange(date => date
.Field("#timestamp")
.GreaterThanOrEquals(from)
.LessThanOrEquals(to)
),
f1 => f1.Term(ir => ir.CustomerId, customerId)
)
)
)
.Aggregations(aa => aa
.Terms("top_referers", ts => ts.Field("request.referer"))
)
)
)
);
Rather than specifying the date range and term queries using a filter aggregation though, I'd be inclined to specify them as the query to the search request. The query is taken into account when calculating aggregations. A filter aggregation is useful when you want to query on a dataset but run an aggregation only on a subset of the dataset e.g. if you were searching across all customers but then wanted to run an aggregation only on a subset of the customers. In practice, for this particular example, the outcome should be the same whether the query is specified as the query part of a search request, or as a filter aggregation with the terms aggregation as a sub aggregation, but the former is perhaps a little easier to get the results from.
Specified as the query would look something like
var result = await _client.Search<InsightRecord>(s => s
.Query(q => q
.Bool(b => b
.Must(
f2 => f2.DateRange(date => date
.Field("#timestamp")
.GreaterThanOrEquals(from)
.LessThanOrEquals(to)
),
f1 => f1.Term(ir => ir.CustomerId, customerId)
)
)
)
.Aggregations(aa => aa
.Terms("top_referers", ts => ts.Field("request.referer"))
)
);
Further, there's a couple more things we can do
Since we're only interested in the results of the terms aggregation and not the search hits, we can specify .Size(0) to not bother returning search hits on the response.
The bool query can be more succinctly expressed by combining queries with operator overloading, and since both queries are predicates (a document either matches the query or it doesn't), we can specify both in a filter clause to omit scoring. The final query then is something like
var result = await _client.SearchAsync<InsightRecord>(s => s
.Size(0)
.Query(q => +q
.DateRange(date => date
.Field("#timestamp")
.GreaterThanOrEquals(from)
.LessThanOrEquals(to)
) && +q
.Term(ir => ir.CustomerId, customerId)
)
.Aggregations(aa => aa
.Terms("top_referers", ts => ts.Field("request.referer"))
)
);
which generates the query
{
"aggs": {
"top_referers": {
"terms": {
"field": "request.referer"
}
}
},
"query": {
"bool": {
"filter": [{
"range": {
"#timestamp": {
"gte": "now-365d",
"lte": "now"
}
}
}, {
"term": {
"customerId": {
"value": "foo"
}
}
}]
}
},
"size": 0
}
The terms aggregation buckets can be accessed as expressed in your question.
Related
I am trying to use NEST to create search query dynamically based on user's input.
I want to add multiple filter in Filter with Term but string field searching is not possible and I cannot find any solution.
Code for example is that, this code try to search on string field an it is not working
var response = await _elasticClient.SearchAsync<CustomerAddressInfo>(p => p
.Query(q => q
.Bool(b => b
.Filter(f => f.Term(t => t.Field(p => p.AccountAddressId).Value(type.AccountAddressId)))
)
)
);
And the other search simple is with integer field and it is working with success
var response = await _elasticClient.SearchAsync<CustomerAddressInfo>(p => p
.Query(q => q
.Bool(b => b
.Filter(f => f.Term(t => t.Field(p => p.CreateUnitId).Value(type.CreateUnitId)))
)
)
);
But; if I search data on string field with Match keyword, again it is successfull on search
var response = await _elasticClient.SearchAsync<CustomerAddressInfo>(p => p
.Query(q => q
.Match(m => m
.Field(f => f.AccountAddressId)
.Query(type.AccountAddressId)
)
)
);
And the question is, how can I give multiple search criteria with Match query method or how can I seach on string field by Term query method on elastic
I am not familiar with NEST, but to search on multiple fields using match query or term query, you can refer following example :
Bool query is used to combine one or more clauses, to know more refer this
Avoid using the term query for text fields.
By default, Elasticsearch changes the values of text fields as part of
analysis. This can make finding exact matches for text field values
difficult.
To search text field values, use the match query instead.
Index Mapping
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"cost": {
"type": "long"
}
}
}
}
Index data:
{
"name":"apple",
"cost":"40"
}
{
"name":"apple",
"cost":"55"
}
Search Query: Multiple Search criteria with match
{
"query": {
"bool": {
"must": [
{ "match": { "name": "apple" }},
{ "match": { "cost": 40 }}
]
}
}
}
Search on-field by term query
{
"query": {
"bool" : {
"must" :[
{"term" : { "name" : "apple" }},
{"term": { "cost":40 }}
]
}
}
}
Search Result:
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "3",
"_score": 1.1823215,
"_source": {
"name": "apple",
"cost": "40"
}
}
]
Hey i do not get the whole requirements of yours. But if you want to add multiple condition on filter then you can do like below.
QueryContainer qSs = null;
foreach(var query in queries) // let say queries is list of yours search item
{
qSs &= new TermQuery { Field = "your_field_name", Value = query };
}
var searchResults = await _elasticClient.SearchAsync<CustomerAddressInfo>(s => s
.Query(q => q
.Bool(b => b.Filter(qSs) )
)
);
The user who calls my Rest API must be able to specify the type of place and then have a placename suggestion.
The placetype must be exact while the placename search uses all the potential of the elaticsearch search.
I'm using NEST (latest nuget version) and Elastic 6.4.
My Api has 2 parameters:
1. query: search text search
2. placetypes: array to define the desired "categories" of documents in the suggestion
The placename suggestion with the Query parameter works well,
but I do not know how to add the condition of placetype.
Nest Mapping :
return map
.Dynamic(false)
.Properties(props => props
.Keyword(n => n
.Name(p => p.Id))
.Text(n => n
.Name(p => p.PlaceType))
.Completion(n => n
.Name(p => p.PlaceName)
.Analyzer("autocompletion_indexation"))
.Completion(n => n
.Name(p => p.Address)
.Analyzer("autocompletion_indexation"))
.GeoPoint(loc =>
{
loc.Name(location => location.Coordinates);
return loc;
}));
Nest AnalysisDescriptor :
return analysis
.CharFilters(c => c
.HtmlStrip("html_strip")
)
.Tokenizers(t => t
.EdgeNGram("custom_ngram", descriptor =>
{
descriptor.MinGram(2);
descriptor.MaxGram(10);
descriptor.TokenChars(new List<TokenChar> { TokenChar.Letter, TokenChar.Digit });
return descriptor;
}
))
.TokenFilters(tf => tf
.Lowercase("lowercase")
.WordDelimiter("word_delimiter", wd =>wd
.SplitOnNumerics()
.SplitOnCaseChange()
)
.AsciiFolding("asciifolding", af => af
.PreserveOriginal(false)
)
.Elision("elision", e => e
.Articles("l", "d", "o")
)
.Synonym("address_synonym", sy => sy
.Synonyms(GetSynonyms())
.Tokenizer("standard")
.Tokenizer("whitespace")
)
.Stop("french_stop", fs => fs
.StopWords("_french_"))
.Stemmer("french_stemmer", fs => fs
.Language("light_french")
)
)
.Analyzers(an => an
.Custom("autocompletion_indexation", c => c
.Tokenizer("custom_ngram")
.Tokenizer("standard")
.Tokenizer("whitespace")
.CharFilters("html_strip")
.Filters("address_synonym",
"lowercase",
"asciifolding",
"elision",
"word_delimiter",
"stop",
"french_stemmer",
"french_stop")
));
Suggest/Search function:
public Task<List<Place>> SuggestDocuments(CancellationToken cancellationToken, string query, params string[] placeTypes)
{
var search = new SearchDescriptor<Place>()
.From(0)
.Size(10)
.Index(PlaceDataService.DefaultPostalAddressIndexName)
.Query(q => q
.MultiMatch(mm => mm
.Query(query)
.Fuzziness(Fuzziness.Auto)
.Fields(fields => fields
.Field(f => f.PlaceName)
)));
var searchResults = _elasticClient.SearchAsync<Place>(search, cancellationToken);
return Task.Run(() => searchResults.Result.Documents.ToList(), cancellationToken);
}
You can combine the multi_match query with a terms query, by wrapping them in a bool query to achieve this
var query = "this is the place name";
var places = new [] { "Place 1", "Place 2" };
var search = new SearchDescriptor<Place>()
.From(0)
.Size(10)
.Index("index")
.Query(q => q
.MultiMatch(mm => mm
.Query(query)
.Fuzziness(Fuzziness.Auto)
.Fields(fields => fields
.Field(f => f.PlaceName)
)
) && +q
.Terms(t => t
.Field(f => f.PlaceType)
.Terms(places)
)
);
var searchResults = await _elasticClient.SearchAsync<Place>(search, cancellationToken);
The above makes use of operator overloading for queries, which you can read more about in the Writing Bool queries docs. The above produces the following request
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "this is the place name",
"fuzziness": "AUTO",
"fields": [
"placeName"
]
}
}
],
"filter": [
{
"terms": {
"placeType": [
"Place 1",
"Place 2"
]
}
}
]
}
}
}
Note that a terms query will OR the terms so that only at least one needs to match. If all inputs need to match, you can and multiple term queries together.
A few things don't look quite right with your mapping and analysis
PlaceName is mapped as a completion datatype but is being used in a multi_match search query, rather than a suggest search. I would have expected it to be mapped as a text datatype for this case.
If PlaceType fields need to be matched exactly as is, I would expect them to be mapped as keyword datatype
html_strip char filter is built in, so doesn't need to be re-specified, unless you need to customise how it works e.g. escaped_tags. Similarly for lowercase token filter.
The "autocompletion_indexation" custom analyzer can only have one tokenizer, so the last assigned wins ("whitespace" tokenizer). Check out the writing analyzers docs.
I have the query shown below:
var queryResult =
await
elastic.SearchAsync<CounterData>(s => s
.Query(q => q
.Bool(b => b
.Must(
m => m.ConstantScore(c => c
.Filter(f => f
.Term(x => x.CounterId, maxList))
),
m => m.ConstantScore(c => c.
Filter(f => f
.Term(x => x.Type, counterType))),
m => m.ConstantScore(c => c.
Filter(f => f.
DateRange(d => d.
GreaterThanOrEquals(lowerBound).Field(r => r.Date)))))))
.AllTypes()
.Scroll("1m")
.Size(10000));
Where maxList is a list of integers. I want to check if the term is in the list but looks like this does not work.
Any ideas how I can check if the term matches any of the elements in the list?
Something like the following will do it
var maxList = new[] { 1, 2, 3, 4, 5};
var counterType = "counter-type";
var lowerBound = DateTime.UtcNow.Date.AddDays(-7);
var queryResult = client.Search<CounterData>(s => s
.Query(q => +q
.DateRange(d => d
.Field(r => r.Date)
.GreaterThanOrEquals(lowerBound)
) && +q
.Term(x => x.Type, counterType) && +q
.Terms(t => t
.Field(x => x.CounterId)
.Terms(maxList)
)
)
.AllTypes()
.Scroll("1m")
.Size(10000)
);
A few things to highlight
+ unary operator applied to a QueryContainerDescriptor<T> is a shorthand for wrapping a query in a bool filter query. I think this is what you want in your case as you don't need to calculate scores, you just want to find matches to a predicate.
&& is overloaded for QueryContainer such that when applied to two QueryContainers, it is a shorthand for a bool must query with two must query clauses. However in this example, the queries all have the + unary operator applied so are bool filter queries, so will be &&'ed together as filter queries.
The value passed to Size() when using Scrolling (i.e. specifying a Scroll() time) is the number of documents to fetch from each shard per scroll, not total documents per scroll. So total documents will be Size() * number of shards. This might be a lot of documents per scroll.
Use the terms query to find documents that match on a field against any one of a list of terms (not analyzed).
The end query json looks like
POST http://localhost:9200/examples/_search?scroll=1m
{
"size": 10000,
"query": {
"bool": {
"filter": [
{
"range": {
"date": {
"gte": "2016-08-04T00:00:00Z"
}
}
},
{
"term": {
"type": {
"value": "counter-type"
}
}
},
{
"terms": {
"counterId": [
1,
2,
3,
4,
5
]
}
}
]
}
}
}
I'm trying Nest plugin for querying elastic search data. I have a yearly job count report based on a field. Currently I have used the date Histogram report for this and below is the elastic query.
POST insight/_search
{
"size": "0",
"query": {
"filtered": {
"query": {
"query_string": {
"query": "(onet.familycode: 11)"
}
}
}
},
"aggregations": {
"jobcounts_by_year": {
"date_histogram": {
"field": "jobdate",
"interval": "year",
"format": "yyyy"
},
"aggregations": {
"count_by_occupation_family": {
"terms": {
"field": "onet.family"
}
}
}
}
}
}
Equivalent Nest query
result = ElasticSearchClient.Instance.Search<Job>(s => s.Size(0)
.Query(query => query
.Filtered(filtered => filtered
.Query(q => q
.QueryString(qs => qs.Query(queryString)))))
.Aggregations(a => a
.DateHistogram("jobcounts_by_year", dt => dt
.Field(ElasticFields.JobDate)
.Interval("year")
.Format("yyyy")
.Aggregations(a1 => a1
.Terms("top_agg", t => t
.Field(criteria.GroupBy.GetElaticSearchTerm())
.Exclude("NA|Unknown|Not available")
.Size(Constants.DataSizeToCompare)))
)));
Everything works well, but now the problem is iterating over the result to get values, For normal aggregation I'm currently doing it like below
data = result.Aggs.Terms("top_agg").Items.Select(item =>
new JobReportResult
{
Group = item.Key,
Count = item.DocCount
}).ToList();
But it seems Nest doesn't support buckets with in Date Histogram buckets.
If i tried like below I'm getting null reference exception.
result.Aggs.DateHistogram("jobcounts_by_year").Terms("top_agg")
It seems we have to use something like below.The d2 now has IAggregation
var d1 = result.Aggs.DateHistogram("jobcounts_by_year").Items;
var d2 =(TermsAggregator)d1[0].Aggregations["top_agg"];
But the Aggregation property is not exposing any values.
I'm stuck here. Can someone let me know how can I access buckets inside DateHistogram Buckets using NEST
Regards,
Try this
var dateHistogram = searchResponse.Aggs.DateHistogram("jobcounts_by_year");
foreach (var item in dateHistogram.Items)
{
var bucket = item.Terms("top_agg");
}
Hope this helps.
I'm developing an .NET application using Elastic Search. I used ES River to index the data.
Results (in Sense) look kinda like this:
{
"_index": "musicstore",
"_type": "songs",
"_id": "J2k-NjXjRa-mgWKAq0RMlw",
"_score": 1,
"_source": {
"songID": 42,
"tempo": "andante",
"metrum": "3/4 E8",
"intonation": "F",
"title": "Song",
"archiveSongNumber": "3684",
"Year": 2000,
"Place": "London"
}
},
To access the indexed data I use NEST queries similar to this:
var result = ElasticClient.Search<Song>(s => s.Query(q => q.Term(p => p.title, "Song")));
I'm having a problem that the query doesn't return any results, when I search for a certain field.
For instance when I search for a title, songID, tempo or archiveSongNumber the query works fine and it returns the same results as Sense, but when I search for Year, Place, metrum, etc. the query doesn't return any results, but it should (Sense does and it should).
Queries like these work (and return the right results):
var result = ElasticClient.Search<Song>(s => s.Query(q => q.Term(p => p.title, "Song")));
var result = ElasticClient.Search<Song>(s => s.Query(q => q.Term(p => p.songID, 42)));
var result = ElasticClient.Search<Song>(s => s.Query(q => q.Term(p => p.archiveSongNumber , "3684")));
Queries like these don't return any results (but they should):
var result = ElasticClient.Search<Song>(s => s.Query(q => q.Term(p => p.Place, "London")));
var result = ElasticClient.Search<Song>(s => s.Query(q => q.Term(p => p.Year, 2000)));
What am I doing wrong? Did I mess up when I was indexing data?
Update:
Mapping looks like this:
{
"musicstore": {
"mappings": {
"songs": {
"properties": {
"Year": {
"type": "long"
},
"Place": {
"type": "string"
},
"archiveSongNumber": {
"type": "string"
},
"songID": {
"type": "long"
},
"intonation": {
"type": "string"
},
"metrum": {
"type": "string"
},
"title": {
"type": "string"
},
"tempo": {
"type": "string"
}
}
}
}
}
}
Update 2:
ES river request looks like this:
PUT /_river/songs_river/_meta
{
"type":"jdbc",
"jdbc": {
"driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
"url":"jdbc:sqlserver://ip_address:1433;databaseName=database",
"user":"user",
"password":"password",
"strategy":"simple",
"poll":"300s",
"autocommit":true,
"fetchsize":10,
"max_retries":3,
"max_retries_wait":"10s",
"index":"musicstore",
"type":"songs",
"analysis": {
"analyzer" :{
"whitespace" :{
"type" : "whitespace",
"filter":"lowercase"
}
}
},
"sql":"some_sql_query"
}
}
ES client configuration looks like this:
private static ElasticClient ElasticClient
{
get
{
Uri localhost = new Uri("http://localhost:9200");
var setting = new ConnectionSettings(localhost);
setting.SetDefaultIndex("musicstore").MapDefaultTypeNames(d => d.Add(typeof(Song), "songs"));
setting.SetConnectionStatusHandler(c =>
{
if (!c.Success)
throw new Exception(c.ToString());
});
return new ElasticClient(setting);
}
}
From looking at your mapping, the issue here is most likely that all of your fields are being analyzed when indexing, but you are using term queries with NEST, which are not analyzed, meaning they will only find exact matches. If you don't explicitly specify an analyzer in your mappings, Elasticsearch defaults to the standard analyzer.
When you perform a search in Elasticsearch using a query string, like you're doing in Sense: GET _search/?q=Place:London, a query string query is what's being run by Elasticsearch, which is different than a term query.
From your examples though, it doesn't look like you are actually using query string syntax. You probably want a match query instead:
client.Search<Song>(s => s
.Query(q => q
.Match(m => m
.OnField(p => p.Place)
.Query("London")
)
)
);
If you do however want a query string query like the one you're performing with Sense, than you can use QueryString:
client.Search<Song>(s => s
.Query(q => q
.QueryString(qs => qs
.OnFields(p => p.Place)
.Query("London")
)
)
);
Hope that helps. I suggest checking out the getting started guide, specifically the section on exact values vs. full text.
Add "keyword" suffix to your Term Field :
var result = ElasticClient.Search<Song>(s => s
.Query(q => q
.Term(p => p
.Field(x => x.Year.Suffix("keyword")).Value(2000))));
Try it, it will work!