Elasticsearch Date Histogram report with Terms aggregation - c#

I'm trying Nest plugin for querying elastic search data. I have a yearly job count report based on a field. Currently I have used the date Histogram report for this and below is the elastic query.
POST insight/_search
{
"size": "0",
"query": {
"filtered": {
"query": {
"query_string": {
"query": "(onet.familycode: 11)"
}
}
}
},
"aggregations": {
"jobcounts_by_year": {
"date_histogram": {
"field": "jobdate",
"interval": "year",
"format": "yyyy"
},
"aggregations": {
"count_by_occupation_family": {
"terms": {
"field": "onet.family"
}
}
}
}
}
}
Equivalent Nest query
result = ElasticSearchClient.Instance.Search<Job>(s => s.Size(0)
.Query(query => query
.Filtered(filtered => filtered
.Query(q => q
.QueryString(qs => qs.Query(queryString)))))
.Aggregations(a => a
.DateHistogram("jobcounts_by_year", dt => dt
.Field(ElasticFields.JobDate)
.Interval("year")
.Format("yyyy")
.Aggregations(a1 => a1
.Terms("top_agg", t => t
.Field(criteria.GroupBy.GetElaticSearchTerm())
.Exclude("NA|Unknown|Not available")
.Size(Constants.DataSizeToCompare)))
)));
Everything works well, but now the problem is iterating over the result to get values, For normal aggregation I'm currently doing it like below
data = result.Aggs.Terms("top_agg").Items.Select(item =>
new JobReportResult
{
Group = item.Key,
Count = item.DocCount
}).ToList();
But it seems Nest doesn't support buckets with in Date Histogram buckets.
If i tried like below I'm getting null reference exception.
result.Aggs.DateHistogram("jobcounts_by_year").Terms("top_agg")
It seems we have to use something like below.The d2 now has IAggregation
var d1 = result.Aggs.DateHistogram("jobcounts_by_year").Items;
var d2 =(TermsAggregator)d1[0].Aggregations["top_agg"];
But the Aggregation property is not exposing any values.
I'm stuck here. Can someone let me know how can I access buckets inside DateHistogram Buckets using NEST
Regards,

Try this
var dateHistogram = searchResponse.Aggs.DateHistogram("jobcounts_by_year");
foreach (var item in dateHistogram.Items)
{
var bucket = item.Terms("top_agg");
}
Hope this helps.

Related

Navigating or transforming JSON with Linq

Let consider this JSON
{
"data": "014",
"theme": "COLORADO CASUAL",
"family": "2163",
"category": "00",
"compo_groups": [
{
"title": "HEAD024",
"values": [
{
"perc": "100",
"desc": "COMP036"
}
]
},
{
"title": "HEAD035",
"values": [
{
"perc": "100",
"desc": "COMP042"
},
{
"perc": "50",
"desc": "COMP043"
}
]
}
],
"product_name": "D812",
"supplier_code": "1011"
}
I need to check that all my compositions are exactly 100pc. In this JSON I have 2 group of composition. The first one is correct. I have one element to 100pc. The second one is composed by 2 elements and total is 150pc. This is an error.
I need to write a code in C# that detect the error. I can write most part of this code. I just don't know how to transform this JSON in list of values I can manage with LinQ.
Assuming you are using a recent version of .NET (e.g. .NET6) then you can use the built-in System.Text.Json libraries to parse your JSON input. If you need to use other parts of the JSON, I would recommend deserialising in to concrete C# classes so you get proper validation, IntelliSense and all that good stuff.
However, if you simply want to check those percentages you can use the STJ library directly, something like this for example:
// Load JSON
var json = "{...}";
var doc = JsonDocument.Parse(json);
// Manually walk the document to get the values you need and summarise
var result = doc.RootElement
.GetProperty("compo_groups")
.EnumerateArray()
.Select(a => new
{
Title = a.GetProperty("title").ToString(),
Percentage = a.GetProperty("values")
.EnumerateArray()
.Select(v => double.Parse(v.GetProperty("perc").ToString()))
.Sum()
});
And you can iterate over that result like this:
foreach(var value in result)
{
Console.WriteLine($"Title '{value.Title}' has a percentage of {value.Percentage}");
}
Which will output this:
Title 'HEAD024' has a percentage of 100
Title 'HEAD035' has a percentage of 150
you don't need any classes to get the data you want
using System.Text.Json;
List<string> titles = JsonNode.Parse(json)["compo_groups"].AsArray()
.Select(x => x["values"].AsArray())
.Where(v => v.Select(x =>
Convert.ToInt32(x["perc"].GetValue<string>())).Sum() > 100)
.Select(v => v.Parent["title"].GetValue<string>())
.ToList(); // result ["HEAD035"]
or
using Newtonsoft.Json;
List<string> titles = JObject.Parse(json)["compo_groups"]
.Select(x => x["values"])
.Where(v => v.Select(x => (int)x["perc"]).Sum() > 100)
.Select(v => v.Parent.Parent)
.Select(p=> (string) p["title"]) // here you can select any data you need
.ToList(); // result ["HEAD035"]

Aggregation with a term and a date range

I'm very new to ElasticSearch, and I'm trying to make an aggregation, but can't seem to get it right.
I have some data in an ElasticSearch index that looks like this:
{
"customerId": "example_customer",
"request": {
"referer": "https://example.org",
}
"#timestamp": "2020-09-29T14:14:00.000Z"
}
My mapping:
{
"mappings": {
"properties": {
"customerId": { "type": "keyword" },
"request": {
"properties": {
"referer": { "type": "keyword" }
}
}
}
}
}
And I'm trying to get the referers that appear the most frequently for a specific customer in a date range. I could make the filter for the customer like this:
var result = await _client.SearchAsync<InsightRecord>(s =>
s.Aggregations(
a => a
.Filter("customer", customer =>
customer.Filter(q => q.Term(ir => ir.CustomerId, customerId)))
.Terms("top_referer", ts => ts.Field("request.referer"))
)
);
return result.Aggregations.Terms("top_referer").Buckets
.Select(bucket => new TopReferer { Url = bucket.Key, Count = bucket.DocCount ?? 0})
Now I want to narrow this down to a specific time range. This is what I have so far:
var searchDescriptor = s.Aggregations(a =>
a.Filter("customer", customer =>
customer.Filter(q =>
q.Bool(b =>
b.Must(
f2 => f2.DateRange(date => date.GreaterThanOrEquals(from).LessThanOrEquals(to)),
f1 => f1.Term(ir => ir.CustomerId, customerId)
)
)
)
)
.Terms("top_referers", ts => ts.Field("request.referer"))
);
The problem is that the date filter doesn't get included in the query, it translates to this JSON:
{
"aggs": {
"customer": {
"filter": {
"bool": {
"filter": [{
"term": {
"customerId": {
"value": "example_customer"
}
}
}
]
}
}
},
"top_referers": {
"terms": {
"field": "request.referer"
}
}
}
}
I tried ordering them differently, but it didn't help. It's always the customer filter that will appear in the JSON, and the date range is skipped. I also saw some people using a query combined with an aggregation, but I feel like I should be able to do this using the aggregation alone. Is this possible? What am I doing wrong in my query that the range doesn't show up in the JSON?
The issue is that the date range query does not specify a field
f2 => f2.DateRange(date => date.GreaterThanOrEquals(from).LessThanOrEquals(to)),
Add a field to this query
f2 => f2.DateRange(date => date
.Field("#timestamp")
.GreaterThanOrEquals(from)
.LessThanOrEquals(to)
),
In addition, for the filter aggregation to apply to the terms aggregation, the terms aggregation needs to be a sub aggregation of the filter aggregation. So it would be something like
var customerId = "foo";
var from = "now-365d";
var to = "now";
var result = await _client.SearchAsync<InsightRecord>(s => s
.Aggregations(a => a
.Filter("customer", customer => customer
.Filter(q => q
.Bool(b => b
.Must(
f2 => f2.DateRange(date => date
.Field("#timestamp")
.GreaterThanOrEquals(from)
.LessThanOrEquals(to)
),
f1 => f1.Term(ir => ir.CustomerId, customerId)
)
)
)
.Aggregations(aa => aa
.Terms("top_referers", ts => ts.Field("request.referer"))
)
)
)
);
Rather than specifying the date range and term queries using a filter aggregation though, I'd be inclined to specify them as the query to the search request. The query is taken into account when calculating aggregations. A filter aggregation is useful when you want to query on a dataset but run an aggregation only on a subset of the dataset e.g. if you were searching across all customers but then wanted to run an aggregation only on a subset of the customers. In practice, for this particular example, the outcome should be the same whether the query is specified as the query part of a search request, or as a filter aggregation with the terms aggregation as a sub aggregation, but the former is perhaps a little easier to get the results from.
Specified as the query would look something like
var result = await _client.Search<InsightRecord>(s => s
.Query(q => q
.Bool(b => b
.Must(
f2 => f2.DateRange(date => date
.Field("#timestamp")
.GreaterThanOrEquals(from)
.LessThanOrEquals(to)
),
f1 => f1.Term(ir => ir.CustomerId, customerId)
)
)
)
.Aggregations(aa => aa
.Terms("top_referers", ts => ts.Field("request.referer"))
)
);
Further, there's a couple more things we can do
Since we're only interested in the results of the terms aggregation and not the search hits, we can specify .Size(0) to not bother returning search hits on the response.
The bool query can be more succinctly expressed by combining queries with operator overloading, and since both queries are predicates (a document either matches the query or it doesn't), we can specify both in a filter clause to omit scoring. The final query then is something like
var result = await _client.SearchAsync<InsightRecord>(s => s
.Size(0)
.Query(q => +q
.DateRange(date => date
.Field("#timestamp")
.GreaterThanOrEquals(from)
.LessThanOrEquals(to)
) && +q
.Term(ir => ir.CustomerId, customerId)
)
.Aggregations(aa => aa
.Terms("top_referers", ts => ts.Field("request.referer"))
)
);
which generates the query
{
"aggs": {
"top_referers": {
"terms": {
"field": "request.referer"
}
}
},
"query": {
"bool": {
"filter": [{
"range": {
"#timestamp": {
"gte": "now-365d",
"lte": "now"
}
}
}, {
"term": {
"customerId": {
"value": "foo"
}
}
}]
}
},
"size": 0
}
The terms aggregation buckets can be accessed as expressed in your question.

Elasticsearch could not search on string field

I am trying to use NEST to create search query dynamically based on user's input.
I want to add multiple filter in Filter with Term but string field searching is not possible and I cannot find any solution.
Code for example is that, this code try to search on string field an it is not working
var response = await _elasticClient.SearchAsync<CustomerAddressInfo>(p => p
.Query(q => q
.Bool(b => b
.Filter(f => f.Term(t => t.Field(p => p.AccountAddressId).Value(type.AccountAddressId)))
)
)
);
And the other search simple is with integer field and it is working with success
var response = await _elasticClient.SearchAsync<CustomerAddressInfo>(p => p
.Query(q => q
.Bool(b => b
.Filter(f => f.Term(t => t.Field(p => p.CreateUnitId).Value(type.CreateUnitId)))
)
)
);
But; if I search data on string field with Match keyword, again it is successfull on search
var response = await _elasticClient.SearchAsync<CustomerAddressInfo>(p => p
.Query(q => q
.Match(m => m
.Field(f => f.AccountAddressId)
.Query(type.AccountAddressId)
)
)
);
And the question is, how can I give multiple search criteria with Match query method or how can I seach on string field by Term query method on elastic
I am not familiar with NEST, but to search on multiple fields using match query or term query, you can refer following example :
Bool query is used to combine one or more clauses, to know more refer this
Avoid using the term query for text fields.
By default, Elasticsearch changes the values of text fields as part of
analysis. This can make finding exact matches for text field values
difficult.
To search text field values, use the match query instead.
Index Mapping
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"cost": {
"type": "long"
}
}
}
}
Index data:
{
"name":"apple",
"cost":"40"
}
{
"name":"apple",
"cost":"55"
}
Search Query: Multiple Search criteria with match
{
"query": {
"bool": {
"must": [
{ "match": { "name": "apple" }},
{ "match": { "cost": 40 }}
]
}
}
}
Search on-field by term query
{
"query": {
"bool" : {
"must" :[
{"term" : { "name" : "apple" }},
{"term": { "cost":40 }}
]
}
}
}
Search Result:
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "3",
"_score": 1.1823215,
"_source": {
"name": "apple",
"cost": "40"
}
}
]
Hey i do not get the whole requirements of yours. But if you want to add multiple condition on filter then you can do like below.
QueryContainer qSs = null;
foreach(var query in queries) // let say queries is list of yours search item
{
qSs &= new TermQuery { Field = "your_field_name", Value = query };
}
var searchResults = await _elasticClient.SearchAsync<CustomerAddressInfo>(s => s
.Query(q => q
.Bool(b => b.Filter(qSs) )
)
);

How to check if a list contains the term in NEST?

I have the query shown below:
var queryResult =
await
elastic.SearchAsync<CounterData>(s => s
.Query(q => q
.Bool(b => b
.Must(
m => m.ConstantScore(c => c
.Filter(f => f
.Term(x => x.CounterId, maxList))
),
m => m.ConstantScore(c => c.
Filter(f => f
.Term(x => x.Type, counterType))),
m => m.ConstantScore(c => c.
Filter(f => f.
DateRange(d => d.
GreaterThanOrEquals(lowerBound).Field(r => r.Date)))))))
.AllTypes()
.Scroll("1m")
.Size(10000));
Where maxList is a list of integers. I want to check if the term is in the list but looks like this does not work.
Any ideas how I can check if the term matches any of the elements in the list?
Something like the following will do it
var maxList = new[] { 1, 2, 3, 4, 5};
var counterType = "counter-type";
var lowerBound = DateTime.UtcNow.Date.AddDays(-7);
var queryResult = client.Search<CounterData>(s => s
.Query(q => +q
.DateRange(d => d
.Field(r => r.Date)
.GreaterThanOrEquals(lowerBound)
) && +q
.Term(x => x.Type, counterType) && +q
.Terms(t => t
.Field(x => x.CounterId)
.Terms(maxList)
)
)
.AllTypes()
.Scroll("1m")
.Size(10000)
);
A few things to highlight
+ unary operator applied to a QueryContainerDescriptor<T> is a shorthand for wrapping a query in a bool filter query. I think this is what you want in your case as you don't need to calculate scores, you just want to find matches to a predicate.
&& is overloaded for QueryContainer such that when applied to two QueryContainers, it is a shorthand for a bool must query with two must query clauses. However in this example, the queries all have the + unary operator applied so are bool filter queries, so will be &&'ed together as filter queries.
The value passed to Size() when using Scrolling (i.e. specifying a Scroll() time) is the number of documents to fetch from each shard per scroll, not total documents per scroll. So total documents will be Size() * number of shards. This might be a lot of documents per scroll.
Use the terms query to find documents that match on a field against any one of a list of terms (not analyzed).
The end query json looks like
POST http://localhost:9200/examples/_search?scroll=1m
{
"size": 10000,
"query": {
"bool": {
"filter": [
{
"range": {
"date": {
"gte": "2016-08-04T00:00:00Z"
}
}
},
{
"term": {
"type": {
"value": "counter-type"
}
}
},
{
"terms": {
"counterId": [
1,
2,
3,
4,
5
]
}
}
]
}
}
}

Searching using NEST does not return results, when querying on certain fields

I'm developing an .NET application using Elastic Search. I used ES River to index the data.
Results (in Sense) look kinda like this:
{
"_index": "musicstore",
"_type": "songs",
"_id": "J2k-NjXjRa-mgWKAq0RMlw",
"_score": 1,
"_source": {
"songID": 42,
"tempo": "andante",
"metrum": "3/4 E8",
"intonation": "F",
"title": "Song",
"archiveSongNumber": "3684",
"Year": 2000,
"Place": "London"
}
},
To access the indexed data I use NEST queries similar to this:
var result = ElasticClient.Search<Song>(s => s.Query(q => q.Term(p => p.title, "Song")));
I'm having a problem that the query doesn't return any results, when I search for a certain field.
For instance when I search for a title, songID, tempo or archiveSongNumber the query works fine and it returns the same results as Sense, but when I search for Year, Place, metrum, etc. the query doesn't return any results, but it should (Sense does and it should).
Queries like these work (and return the right results):
var result = ElasticClient.Search<Song>(s => s.Query(q => q.Term(p => p.title, "Song")));
var result = ElasticClient.Search<Song>(s => s.Query(q => q.Term(p => p.songID, 42)));
var result = ElasticClient.Search<Song>(s => s.Query(q => q.Term(p => p.archiveSongNumber , "3684")));
Queries like these don't return any results (but they should):
var result = ElasticClient.Search<Song>(s => s.Query(q => q.Term(p => p.Place, "London")));
var result = ElasticClient.Search<Song>(s => s.Query(q => q.Term(p => p.Year, 2000)));
What am I doing wrong? Did I mess up when I was indexing data?
Update:
Mapping looks like this:
{
"musicstore": {
"mappings": {
"songs": {
"properties": {
"Year": {
"type": "long"
},
"Place": {
"type": "string"
},
"archiveSongNumber": {
"type": "string"
},
"songID": {
"type": "long"
},
"intonation": {
"type": "string"
},
"metrum": {
"type": "string"
},
"title": {
"type": "string"
},
"tempo": {
"type": "string"
}
}
}
}
}
}
Update 2:
ES river request looks like this:
PUT /_river/songs_river/_meta
{
"type":"jdbc",
"jdbc": {
"driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
"url":"jdbc:sqlserver://ip_address:1433;databaseName=database",
"user":"user",
"password":"password",
"strategy":"simple",
"poll":"300s",
"autocommit":true,
"fetchsize":10,
"max_retries":3,
"max_retries_wait":"10s",
"index":"musicstore",
"type":"songs",
"analysis": {
"analyzer" :{
"whitespace" :{
"type" : "whitespace",
"filter":"lowercase"
}
}
},
"sql":"some_sql_query"
}
}
ES client configuration looks like this:
private static ElasticClient ElasticClient
{
get
{
Uri localhost = new Uri("http://localhost:9200");
var setting = new ConnectionSettings(localhost);
setting.SetDefaultIndex("musicstore").MapDefaultTypeNames(d => d.Add(typeof(Song), "songs"));
setting.SetConnectionStatusHandler(c =>
{
if (!c.Success)
throw new Exception(c.ToString());
});
return new ElasticClient(setting);
}
}
From looking at your mapping, the issue here is most likely that all of your fields are being analyzed when indexing, but you are using term queries with NEST, which are not analyzed, meaning they will only find exact matches. If you don't explicitly specify an analyzer in your mappings, Elasticsearch defaults to the standard analyzer.
When you perform a search in Elasticsearch using a query string, like you're doing in Sense: GET _search/?q=Place:London, a query string query is what's being run by Elasticsearch, which is different than a term query.
From your examples though, it doesn't look like you are actually using query string syntax. You probably want a match query instead:
client.Search<Song>(s => s
.Query(q => q
.Match(m => m
.OnField(p => p.Place)
.Query("London")
)
)
);
If you do however want a query string query like the one you're performing with Sense, than you can use QueryString:
client.Search<Song>(s => s
.Query(q => q
.QueryString(qs => qs
.OnFields(p => p.Place)
.Query("London")
)
)
);
Hope that helps. I suggest checking out the getting started guide, specifically the section on exact values vs. full text.
Add "keyword" suffix to your Term Field :
var result = ElasticClient.Search<Song>(s => s
.Query(q => q
.Term(p => p
.Field(x => x.Year.Suffix("keyword")).Value(2000))));
Try it, it will work!

Categories