Elasticsearch Dynamic Aggregations with NEST - c#

Hi there I have the following mapping for product in elastic
I am trying to create aggregations from the Name / Value data in product specifications I think what i need to achieve is with Nested aggregations but im struggling with the implementation
"mappings": {
"product": {
"properties": {
"productSpecification": {
"properties": {
"productSpecificationId": {
"type": "long"
},
"specificationId": {
"type": "long"
},
"productId": {
"type": "long"
},
"name": {
"fielddata": true,
"type": "text"
},
"value": {
"fielddata": true,
"type": "text"
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"value": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
}
}
},
"description": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"reviewRatingCount": {
"type": "integer"
},
"productId": {
"type": "integer"
},
"url": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"dispatchTimeInDays": {
"type": "integer"
},
"productCode": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
I have now changed the code below and I am getting some success
.Aggregations(a => a
.Terms("level1",t => t
.Field(f=> f.ProductSpecification.First().Name)
.Aggregations(snd => snd
.Terms("level2", f2 => f2.Field(f3 => f3.ProductSpecification.First().Value))
)))
by using this code i am now returning the Name values
var myagg = response.Aggs.Terms("level1");
if(response.Aggregations != null)
{
rtxAggs.Clear();
rtxAggs.AppendText(Environment.NewLine);
foreach(var bucket in myagg.Buckets)
{
rtxAggs.AppendText(bucket.Key);
}
}
What i cant figure out is how to then get the sub aggregation values

Right after much experimenting and editing Ive managed to get to the bottom of this
First up I Modified productSpecification back to nested and then used the following in the aggregation
.Aggregations(a => a
.Nested("specifications", n => n
.Path(p => p.ProductSpecification)
.Aggregations(aa => aa.Terms("groups", sp => sp.Field(p => p.ProductSpecification.Suffix("name"))
.Aggregations(aaa => aaa
.Terms("attribute", tt => tt.Field(ff => ff.ProductSpecification.Suffix("value"))))
)
)
)
)
Then used the following to get the values.
var groups = response.Aggs.Nested("specifications").Terms("groups");
foreach(var bucket in groups.Buckets)
{
rtxAggs.AppendText(bucket.Key);
var values = bucket.Terms("attribute");
foreach(var valBucket in values.Buckets)
{
rtxAggs.AppendText(Environment.NewLine);
rtxAggs.AppendText(" " + valBucket.Key + "(" + valBucket.DocCount + ")");
}
rtxAggs.AppendText(Environment.NewLine);
}
All seems to be working fine hopefully this helps some people, on to my next challenge of boosting fields and filtering on said aggregations.

Related

Retrieving list of documents from collection by id in nested list

I have documents like this:
[
// 1
{
"_id": ObjectId("573f3944a75c951d4d6aa65e"),
"Source": "IGN",
"Family": [
{
"Countries": [
{
"uid": 17,
"name": "Japan",
}
]
}
]
},
// 2
{
"_id": ObjectId("573f3d41a75c951d4d6aa65f"),
"Source": "VG",
"Family": [
{
"Countries": [
{
"uid": 17,
"name": "USA"
}
]
}
]
},
// 3
{
"_id": ObjectId("573f4367a75c951d4d6aa660"),
"Source": "NRK",
"Family": [
{
"Countries": [
{
"uid": 17,
"name": "Germany"
}
]
}
]
},
// 4
{
"_id": ObjectId("573f4571a75c951d4d6aa661"),
"Source": "VG",
"Family": [
{
"Countries": [
{
"uid": 10,
"name": "France"
}
]
}
]
},
// 5
{
"_id": ObjectId("573f468da75c951d4d6aa662"),
"Source": "IGN",
"Family": [
{
"Countries": [
{
"uid": 14,
"name": "England"
}
]
}
]
}
]
I want to return only the documents with source equals 'Countries.uid' equal 17
so I have in the end :
[
{
"_id": ObjectId("573f3944a75c951d4d6aa65e"),
"Source": "IGN",
"Family": [
{
"Countries": [
{
"uid": 17,
"name": "Japan",
}
]
}
]
},
{
"_id": ObjectId("573f3d41a75c951d4d6aa65f"),
"Source": "VG",
"Family": [
{
"Countries": [
{
"uid": 17,
"name": "USA"
}
]
}
]
},
{
"_id": ObjectId("573f4367a75c951d4d6aa660"),
"Source": "NRK",
"Family": [
{
"Countries": [
{
"uid": 17,
"name": "Germany"
}
]
}
]
}
]
How can I do this with the official C# MongoDB driver?
Tried this :
public List<Example> getLinkedCountry(string porduitId)
{
var filter = Builders<Example>.Filter.AnyIn("Family.Countries.uid", porduitId);
var cursor = await _certificats.FindAsync(filter);
var docs = cursor.ToList();
return docs;
}
Unfortunately, I think my filter is wrong.
Is there a way to find all the documents by accessing the nested list by id and retrieving it?
Solution 1
Use ElemMatch instead of AnyIn.
var filter = Builders<Example>.Filter.ElemMatch(
x => x.Family,
y => y.Countries.Any(z => z.uid == porduitId));
Output
Solution 2
If you are unconfident with MongoDB .Net Driver syntax, you can convert the query as BsonDocument via MongoDB Compass (Export to language feature).
var filter = new BsonDocument("Family.Countries.uid", porduitId);
Just to expand on #Yong Shun 's answer,
if you just want to return the list of nested documents and not all of it, you have a few options.
Using project
var filter = Builders<Example>.Filter.ElemMatch(
x => x.Family,
y => y.Countries.Any(z => z.uid == porduitId));
var project = Builders<Example>.Project.ElemMatch(
x => x.Family,
y => y.Countries.Any(z => z.uid == porduitId)
);
var examples = await collection.filter(filter).Project<Example>(project).toListAsync();
Using the aggregate pipeline
var filter = Builders<Example>.Filter.ElemMatch(
x => x.Family,
y => y.Countries.Any(z => z.uid == porduitId));
var project = Builders<ServiceProvider>.Projection.Expression(
x => x.Faimily.Where(y => y.uid == porduitId)
);
var result = await collection
.Aggregate()
.Match(filter)
.Project(project)
.ToListAsync(); //Here result is a list of Iterable<Countries>

Elasticsearch filter group by using nest c#

I am using elastic search to get the products grouped by category and perform aggregations on result....
If I use categoryid(numeric) as a field its giving result but when i try to give category name its giving Unsuccessful(400)
Please see the blow code snippet
I am getting document count. Can i get document data from same request?
ISearchResponse<Products> results;
results = _client.Search<Products>(s => s
//.Size(int.MaxValue)
.Query(q => q
.Bool(b => b
.Should(
bs => bs.Prefix(p => p.cat_name, "heli"),
bs => bs.Prefix(p => p.pr_name, "heli")
)
)
)
.Aggregations(a => a
.Terms("catname", t => t
.Field(f => f.categoryid)
.Size(int.MaxValue)
.Aggregations(agg => agg
.Max("maxprice", av => av.Field(f2 => f2.price))
.Average("avgprice", av => av.Field(f3 => f3.price))
.Max("maxweidht", av => av.Field(f2 => f2.weight))
.Average("avgweight", av => av.Field(f3 => f3.weight))
)
)
)
);
mapping model:
{
"product_catalog": {
"mappings": {
"properties": {
"#timestamp": {
"type": "date"
},
"#version": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"cat_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"categoryid": {
"type": "long"
},
"createdon": {
"type": "date"
},
"fulldescription": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"height": {
"type": "float"
},
"length": {
"type": "float"
},
"pr_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"price": {
"type": "long"
},
"productid": {
"type": "long"
},
"shortdescription": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"sku": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"updatedon": {
"type": "date"
},
"weight": {
"type": "float"
},
"width": {
"type": "float"
}
}
}
}
}
Can anyone guide how to use category name for grouping.
catname field is of type text and thus you can't use it by default in aggregations because fielddata is disabled for performance reasons.
Based on your mapping I see you are also indexing keyword as well for catname so you can use this field.
Just change this part of your term aggregation .Field(f => f.categoryid) to
.Field(f => f.cat_name.Suffix("keyword")) and you should be good.

Group, calculation and order of ElasticSearch data

I have a lot of data stored in the following format (I simplified the data to explain the problem).
What I need is:
group all the data by "Action Id" field
calculate the difference between max and min values of "Created Time" for each group (from the previous action)
order the results by the calculated field ("Action duration" - difference between max and min)
I use NEST (C#) to query the ElasticSearch. I think that if you can help me with native Elastic query it also will be very helpful, I'll translate it to C#.
Thank you.
Case your mappings looks like that:
PUT /index
{
"mappings": {
"doc": {
"properties": {
"ActionId": {
"type": "text",
"fielddata": true
},
"CreatedDate":{
"type": "date"
},
"SubActionName":{
"type": "text",
"fielddata": true
}
}
}
}
}
Your elasticsearch query should look like that:
GET index/_search
{
"size": 0,
"aggs": {
"actions": {
"terms": {
"field": "ActionId"
},
"aggs": {
"date_created": {
"date_histogram": {
"field": "CreatedDate",
"interval": "hour"
},
"aggs": {
"the_max": {
"max": {
"field": "CreatedDate"
}
},
"the_min": {
"min": {
"field": "CreatedDate"
}
},
"diff_max_min": {
"bucket_script": {
"buckets_path": {
"max": "the_max",
"min": "the_min"
},
"script": "params.max - params.min"
}
}
}
}
}
}
}
}
You can read more about Pipeline Aggregetions here
Hope that helps

How to search a content of a document attached in elasticsearch index

I have created the index in elasticsearch as
this.client.CreateIndex("documents", c => c.Mappings(mp => mp.Map<DocUpload>
(m => m.Properties(ps => ps.Attachment
(a => a.Name(o => o.Document)
.TitleField(t => t.Name(x => x.Title).TermVector(TermVectorOption.WithPositionsOffsets))
)))));
the attachment is base64 encoded before indexing. I am not able to search a content inside any of the document. Is base64 encoding creates any problem. Can anyone please help?
Browser response is like
{
"documents": {
"aliases": {},
"mappings": {
"indexdocument": {
"properties": {
"document": {
"type": "attachment",
"fields": {
"content": {
"type": "string"
},
"author": {
"type": "string"
},
"title": {
"type": "string",
"term_vector": "with_positions_offsets"
},
"name": {
"type": "string"
},
"date": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"keywords": {
"type": "string"
},
"content_type": {
"type": "string"
},
"content_length": {
"type": "integer"
},
"language": {
"type": "string"
}
}
},
"documentType": {
"type": "string"
},
"id": {
"type": "long"
},
"lastModifiedDate": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"location": {
"type": "string"
},
"title": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1465193502636",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "5kCRvhmsQAGyndkswLhLrg",
"version": {
"created": "2030399"
}
}
},
"warmers": {}
}
}
I found the solution by adding an analyser.
var fullNameFilters = new List<string> { "lowercase", "snowball" };
client.CreateIndex("mydocs", c => c
.Settings(st => st
.Analysis(anl => anl
.Analyzers(h => h
.Custom("full", ff => ff
.Filters(fullNameFilters)
.Tokenizer("standard"))
)
.TokenFilters(ba => ba
.Snowball("snowball", sn => sn
.Language(SnowballLanguage.English)))
))
.Mappings(mp => mp
.Map<IndexDocument>(ms => ms
.AutoMap()
.Properties(ps => ps
.Nested<Attachment>(n => n
.Name(sc => sc.File)
.AutoMap()
))
.Properties(at => at
.Attachment(a => a.Name(o => o.File)
.FileField(fl=>fl.Analyzer("full"))
.TitleField(t => t.Name(x => x.Title)
.Analyzer("full")
.TermVector(TermVectorOption.WithPositionsOffsets)
)))
))
);

JToken get a specific value

I have following JToken output. How can I retrieve the 'value' here from TenantID which should be 1 in this case?
{[
{
"value": 1,
"metadata": {
"userType": 0,
"flags": 8,
"type": {
"type": "INT4",
"name": "Int",
"id": 56
},
"colName": "TenantID"
}
}
]}
This is my current code :
JToken value;
if (usr.Profile.TryGetValue("tenantid", out value))
{
JObject inner = value["value"].Value<JObject>(); //not working with null error
User.TenantID = (string)value;
}
User.obj = usr.Profile;
EDIT - please find below the complete JToken output :
{[
{
"value": 1,
"metadata": {
"userType": 0,
"flags": 8,
"type": {
"type": "INT4",
"name": "Int",
"id": 56
},
"colName": "TenantID"
}
}
]}
Count: 1
First (Newtonsoft.Json.Linq.JContainer): {{
"value": 1,
"metadata": {
"userType": 0,
"flags": 8,
"type": {
"type": "INT4",
"name": "Int",
"id": 56
},
"colName": "TenantID"
}
}}
First (Newtonsoft.Json.Linq.JToken): {{
"value": 1,
"metadata": {
"userType": 0,
"flags": 8,
"type": {
"type": "INT4",
"name": "Int",
"id": 56
},
"colName": "TenantID"
}
}}
HasValues: true
IsReadOnly: false
Last (Newtonsoft.Json.Linq.JContainer): {{
`"value": 1,
"metadata": {
"userType": 0,
"flags": 8,
"type": {
"type": "INT4",
"name": "Int",
"id": 56
},
"colName": "TenantID"
}
}}
Last (Newtonsoft.Json.Linq.JToken): {{
"value": 1,
"metadata": {
"userType": 0,
"flags": 8,
"type": {
"type": "INT4",
"name": "Int",
"id": 56
},
"colName": "TenantID"
}
}}
Next: (null)
Parent: {"tenantid": [
{
"value": 1,
"metadata": {
"userType": 0,
"flags": 8,
"type": {
"type": "INT4",
"name": "Int",
"id": 56
},
"colName": "TenantID"
}
}
]}
Path: "tenantid"
Previous: (null)
Static members:
Non-public members:
IEnumerator:
Root: {{
"name": "Rx Sidhu",
"given_name": "Rx",
"family_name": "Sidhu",
"locale": "en_US",
"emails": [
"ranxdeep#xxx.com",
"ranxdeep#xxx.com",
"ranxdeep#hx.com"
],
"nickname": "ranxdeep#xxx.com",
"email": "ranxdeep#xxx.com",
"picture": "https://apis.live.net/v5.0/f1xxxxxx/picture",
"roles": [
"Account Admin",
"Admin"
],
"tenantid": [
{
"value": 1,
"metadata": {
"userType": 0,
"flags": 8,
"type": {
"type": "INT4",
"name": "Int",
"id": 56
},
"colName": "TenantID"
}
}
],
"email_verified": true,
"clientID": "wxxxvC8",
"updated_at": "2015-07-15T16:26:30.526Z",
"user_id": "windowslive|f1axxxac",
"identities": [
{
"access_token": "EwBwAq1",
"provider": "windowslive",
"user_id": "f1aexxxxxac",
"connection": "windowslive",
"isSocial": true
}
],
"created_at": "2015-07-01T06:08:21.358Z"
}}
Type: 2
I need to check if the tenantID actually exists and then get value, else return null or 0.
You should be able to do:
JObject jObject = JObject.Parse(...);
JToken value = jObject.SelectToken("value");
You parse your object, then the inner contents should be exposed in which you can leverage the SelectToken method to find that specific value.
To build it out a bit, you could potentially do:
public static JToken FindToken<T>(string key, T value)
{
string serialized = NewtonsoftJsonSerializer.Instance.Serialize(value);
var jObject = JObject.Parse(serialized);
var jToken = jObject.SelectToken(key);
if(jToken != null)
return jToken;
return null;
}
I still cannot get the overall picture of your JSON, though have a look at this. It may help you.
var str = #"{
""x"": [{
""value"": 1,
""metadata"": {
""userType"": 0,
""flags"": 8,
""type"": {
""type"":""INT4"",
""name"":""Int"",
""id"": 56
},
""colName"":""TenantID""
}
}
]
}";
var parentJObject = JObject.Parse(str);
var xJArray = (JArray)parentJObject["x"];
// first item in JArray which is the object of interest
// look for the appropriate index of the JObject of your data
var firstJTokenInxJArray = (JObject)xJArray[0];
Console.WriteLine(firstJTokenInxJArray["value"].ToString());
I got what I was looking for :
JToken value;
if (usr.Profile.TryGetValue("tenantid", out value))
{
User.TenantID = (int)value [0] ["value"];
}

Categories