ElasticSearch Nest, edge n gram with Fuziness - c#

I am using ElastiSearch.Net and NEST v7.10.0
I have these settings and mappings for elastic search.
{
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"keyword_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"char_filter": [],
"type": "custom",
"tokenizer": "keyword"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 50,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": {
"properties": {
"MatchName": {
"type": "text",
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
}
},
"analyzer": "standard"
},
"CompetitionName": {
"type": "text",
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
}
},
"analyzer": "standard"
}
}
}
}
I have indexed 3 documents with values
{
"_source": {
"CompetitionName": "Premiership",
"MatchName": "Dundee Utd - St Johnstone",
}
},
{
"_source": {
"CompetitionName": "2nd Div, Vastra Gotaland UOF",
"MatchName": "IF Limhamn Bunkeflo - FC Rosengaard 1917",
}
},
{
"_source": {
"CompetitionName": "Bundesliga",
"MatchName": "Hertha Berlin - Eintracht Frankfurt",
}
}
And i am searching with Fuziness.Auto in both fields with string "bunde".
I want to achieve to get all the documents with the search above.
But for the query below i get nothing.
string value = "bunde";
BoolQuery boolQuery = new BoolQuery
{
Should = new List<QueryContainer>
{
new QueryContainer(new FuzzyQuery
{
Field = Infer.Field<EventHistoryDoc>(path:eventHistoryDoc => eventHistoryDoc.MatchName),
Value = value,
Fuzziness = Fuzziness.Auto,
}),
new QueryContainer(new FuzzyQuery
{
Field = Infer.Field<EventHistoryDoc>(path:eventHistoryDoc => eventHistoryDoc.CompetitionName),
Value = value,
Fuzziness = Fuzziness = Fuzziness.Auto,
})
}
};
ISearchRequest searchRequest = new SearchRequest
{
Query = new QueryContainer(boolQuery),
};
var json = _elasticClient.RequestResponseSerializer.SerializeToString(searchRequest);
ISearchResponse<EventHistoryDoc> searchResponse = await _elasticClient.SearchAsync<EventHistoryDoc>(searchRequest);
If i search with string "bundes" i get only one document
{
"_source": {
"CompetitionName": "Bundesliga",
"MatchName": "Hertha Berlin - Eintracht Frankfurt",
}
}
Any idea about changes should i do to settings, mapping or query in order to get as response all the documents above?

I am not aware of the syntax of Elasticsearch Nest, but in JSON format you can achieve your result in the following way:
Adding a working example with index mapping, search query, and search result
(For now, I have removed the keyword_analyzer and edge_ngram_search_analyzer from the index mapping, as you just wanted to return all the documents with edge ngram along with fuzziness)
Index Mapping:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 50,
"token_chars": [
"letter",
"digit"
]
}
}
},
"max_ngram_diff": 50
},
"mappings": {
"properties": {
"CompetitionName": {
"type": "text",
"analyzer": "my_analyzer"
},
"MatchName": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Search Query:
{
"query": {
"multi_match": {
"query": "bunde",
"fuzziness": "AUTO"
}
}
}
Search Result:
"hits": [
{
"_index": "64968421",
"_type": "_doc",
"_id": "1",
"_score": 2.483365,
"_source": {
"CompetitionName": "Premiership",
"MatchName": "Dundee Utd - St Johnstone"
}
},
{
"_index": "64968421",
"_type": "_doc",
"_id": "3",
"_score": 2.4444416,
"_source": {
"CompetitionName": "Bundesliga",
"MatchName": "Hertha Berlin - Eintracht Frankfurt"
}
},
{
"_index": "64968421",
"_type": "_doc",
"_id": "2",
"_score": 0.6104546,
"_source": {
"CompetitionName": "2nd Div, Vastra Gotaland UOF",
"MatchName": "IF Limhamn Bunkeflo - FC Rosengaard 1917"
}
}
]
The index mapping provided in the question is also correct. When using the same index mapping (as provided in the question) and searching for bunde in the multi-match query (as shown above), all the three documents are returned (which is the expected result).

Related

Performing an update on objects from an array by using object own values

Having a collection sampled by next:
{
"_id": {
"$oid": "6139b08ee5a3119445892f94"
},
"type": "bike",
"specs": [
{
"name": "giant",
"models": [
{
"name": "v1",
"categories": [
{
"name": "v11",
"price": 0
},
{
"name": "v12",
"price": 0.1
},
{
"name": "v13",
"price": 0.1
}
]
},
{
"name": "v2",
"categories": [
{
"name": "v21",
"price": 1
},
{
"name": "v22",
"price": 0.1
}
]
}
]
},
{
"name": "sputnik",
"models": [
{
"name": "s1",
"categories": [
{
"name": "s11",
"price": 20
},
{
"name": "s12",
"price": 0.9
},
{
"name": "s13",
"price": 1.1
}
]
},
{
"name": "s2",
"categories": [
{
"name": "s31",
"price": 1
},
{
"name": "s32",
"price": 0.1
}
]
}
]
}
]
}
In order to edit models items by adding a new subdocument with next structure:
"valid": {
"isValid": true,
"currentName": [value from name field]
}
and remove field name
Almost achieved that by using Aggregations:
var dineInOptionsDefinition =
BsonDocument.Parse("{ isValid: 5, currentName: '$specs.name'}");
var addNewFieldsDefinition = new BsonDocument
{
{ "$addFields", new BsonDocument
{
{
"specs.models.valid", dineInOptionsDefinition
}
}
}
};
var t1 = collection
.Aggregate()
.Match(filterDefinition)
.AppendStage<BsonDocument>(addNewFieldsDefinition)
.Merge(collection);
The problem I'm facing is the value for currentName - as result it's an array of concatenated name fields values. Any clue how to get the value from the current document field value?

ElasticSearch Exact match on multiple fields

I'm newbie of Elastic Search, I'm trying to get an exact match on every field of an object in elasticsearch index. For example I have two object:
{
"_index": "sql",
"_type": "_doc",
"_id": "mpovsH",
"_score": 1.0,
"_source": {
"entityId": 1,
"user": "userfirst",
"descr": "testfirst",
}
},
{
"_index": "sql",
"_type": "_doc",
"_id": "mpovsH",
"_score": 1.0,
"_source": {
"entityId": 2,
"user": "usersecond",
"descr": "testsecond",
}
}
I want the search the string "userfirst" on all fields of the object, and get only the first one as response. I tried:
var searchResponse = client.SearchAsync<MyObject>(s => s
.Source(sf => sf)
.Query(q => q
.MultiMatch(a => a
.Query(queryValue)))).Result;
Where queryValue is "userfirst" but I get both object in results. How can I change it? Also, I would not write every single field if possible to search, because my object is way more bigger.
EDIT: I managed to get only one results with this query:
var searchResponse = client.SearchAsync<TendersElasticSearch>(s => s
.Source(sf => sf)
.Query(qn => qn
.MatchPhrasePrefix(ma => ma
.Field(x => x.User)
.Query(queryValue)))).Result;
But with this query, I get results only on field user. I would like to search on all fields of every object. Any tips?
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"users": {
"type": "nested"
}
}
}
}
Index Data:
{
"users": [
{
"entityId": 1,
"user": "userfirst",
"descr": "testfirst"
},
{
"entityId": 2,
"user": "usersecond",
"descr": "testsecond"
}
]
}
Search Query:
{
"query": {
"nested": {
"path": "users",
"query": {
"bool": {
"must": [
{ "match": { "users.user": "userfirst" }}
]
}
},
"inner_hits":{}
}
}
}
Search Query using Multi match:
{
"query": {
"nested": {
"path": "users",
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "userfirst",
"fields": [
"users.user"
]
}
}
]
}
},
"inner_hits": {}
}
}
}
Search Result:
hits": [
{
"_index": "stof_64061575",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "users",
"offset": 0
},
"_score": 0.6931471,
"_source": {
"entityId": 1,
"user": "userfirst",
"descr": "testfirst"
}
}
]
C# Query :
var searchResponse = client.SearchAsync<MyObject>(s => s
.Source(sf => sf
.Nested(n=>n.Path("users").
Query(qn=>qn.bool(
b=> b.Must(
m => m. .Query(q => q
.MultiMatch(a => a
.Query(queryValue))))))
)
).Result;

How to add sorting for a *Text Field to ElasticSearch Query

NOTE: An answer was found, see the end of this post and the comments for more information.
ElasticSearch (ver 6) is my kryptonite. I have been trying to add a sorting to a query for far too long and only getting errors in response:
No mapping
Fielddata is disabled on text fields by default. Set fielddata=true ...
when I try to set "fielddata=true" ... I get more errors
[field_sort] unknown field [fielddata], parser not found
I've tried ingore_unmapped and get the same "unknown field" error.
The ES documentation is so generic that it is unhelpful, and nothing here on SO has helped... I've looked at/read after running into various errors
https://stackoverflow.com/a/17051944/10358406
unknown field [dest], parser not found- error coming while reindexing
Elasticsearch: Expected field name but got START_OBJECT
how to set fielddata=true in kibana
So now...
Here is my current query that I use through Postman. I use Nest in my C# project to build this up. I know how to build queries if I can get the json structure right in Postman:
{
"from": 0,
"size": 50,
"aggs": {
"specs": {
"nested": {
"path": "specs"
},
"aggs": {
"names": {
"terms": {
"field": "specs.name",
"size": 10
},
"aggs": {
"specValues": {
"terms": {
"field": "specs.value",
"size": 10
}
}
}
}
}
}
},
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"Id": {
"value": 1
}
}
}
],
"must_not": [
{
"terms": {
"partId": []
}
}
],
"should": [
{
"terms": {
"picCode": [
"b02"
]
}
},
{
"terms": {
"partId": []
}
}
]
}
}
}
}
}
The results I get back are like this:
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 281,
"max_score": 1,
"hits": [
{
"_index": "09812734",
"_type": "part",
"_id": "1234:1",
"_score": 1,
"_source": {
"id": "1234:1",
"partId": 1234,
"mcfCostCenterId": 1,
"oemCode": "ABC",
"oemDescription": "Blah blah blah",
"oemPartCode": "123456",
"picCode": "B02",
"description": "other blah blah",
"isServiceable": false,
"marketingDescription": "this thing does stuff"",
"salesVolume": 0,
"searchSortOrder": 0,
"catalogSortOrders": [],
"specs": [
{
"name": "Color",
"value": "NA"
},
{
"name": "Diameter",
"value": "7.0000"
},
{
"name": "OtherSpec",
"value": "Q"
},
{
"name": "LastSpec",
"value": "FuBar"
}
]
}
}
]
},
"aggregations": {
"specs": {
"doc_count": 18,
"names": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Length",
"doc_count": 4,
"specValues": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "2",
"doc_count": 1
},
{
"key": "3",
"doc_count": 1
},
{
"key": "8",
"doc_count": 1
}
]
}
}
]
}
}
}
}
I need to sort on description or salesVolume in the results.
We have another query that does sorting, so I used that same code for this query, but get the above errors. That code builds up the json as follows:
{
"from": 0,
"size": 50,
"sort": [
{
"foo.bar": {
"missing": "_last",
"order": "desc",
"nested_filter": {
"term": {
"foo.bar": {
"value": "frankenstein"
}
}
},
"nested_path": "_source"
}
}
],
"aggs": {
"specs": {
"nested": {
"path": "specs"
...
As I try to edit that to make it relevant to this search, I just get errors and no results.
EDIT:
So, getting "salesVolume" is pretty easy because it isn't a text field.
{
"from": 0,
"size": 50,
"sort": [
{
"salesVolume": {
"missing": "_last",
"order": "desc"
}
}
],
"aggs": {
...
This sets up a sort order no problem. So, the crux is text... I just can't figure out where to set the fielddata to true without erroring out.
EDIT 2:
In my DTO I have the field already set up with the keyword attribute.
public class MyClass
{
public string Id
[Text]
public string ThisCode
[Keyword]
public string MySortableTextField
... //and so forth
}
The search will still not perform and errors out.
ANSWER:
^^^ The Keyword attribute worked. I was not placing this in the right project. We have a project that builds out all our db indexes and that was still using the Text attribute. Once a indexing was redone then I was able to apply the sort order without errors.

Elasticsearch NEST 5.x Koelnerphonetic not matching

UPDATE
I changed the approach of the question
I'm trying to apply phonetic search with koelner phonetics and also ngram is used.
Index configuration I'm using:
{
"testnew": {
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "testnew",
"creation_date": "1489672932033",
"analysis": {
"filter": {
"koelnerPhonetik": {
"replace": "false",
"type": "phonetic",
"encoder": "koelnerphonetik"
}
},
"analyzer": {
"koelnerPhonetik": {
"type": "custom",
"tokenizer": "koelnerPhonetik"
},
"ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer"
}
},
"tokenizer": {
"koelnerPhonetik": {
"type": "standard"
},
"ngram_tokenizer": {
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
],
"min_gram": "2",
"type": "ngram",
"max_gram": "20"
}
}
},
...
}
}
}
}
}
Ive got one document that looks like this:
{
"_index": "testnew",
"_type": "person",
"_id": "3",
"_score": 1,
"_source": {
"name": "Can",
"fields": {
"phonetic": {
"type": "string",
"analyzer": "koelnerPhonetik"
}
}
}
It is mapped like this:
GET testnew/person/_mapping
"name": {
"type": "text",
"analyzer": "koelnerPhonetik"
}
Why cant I find 'Can' by searching for 'Kan' in this query?
GET testnew/person/_search
{
"query": {
"match": {
"name.phonetic": {
"query": "Kan"
}
}
}
}

Using C# Nest API to get nested json data is not retrieving data

I have the following json coming through the elasticsearch:
{
"_index": "data-2016-01-14",
"_type": "type-data",
"_id": "AVJBBNG-TE8FYIA1rf1p",
"_score": 1,
"_source": {
"#message": {
"timestamp": 1452789770326461200,
"eventID": 1452789770326461200,
"eventName": "New",
"Price": "38.34",
"Qty": 100,
"statistic_LatencyValue_ns": 1142470,
"statistic_LatencyViolation": false,
"statistic_LossViolation": false
},
"#timestamp": "2016-01-14T16:42:50.326Z",
"#fields": {
"timestamp": "1452789770326"
}
},
"fields": {
"#timestamp": [
1452789770326
]
}
}
I'm using Nest to try to get the eventName data i created the class and marked the property:
public class ElasticTest
{
[ElasticProperty(Type = FieldType.Nested)]
public string eventName { get; set; }
}
But the following query is returning 0 results, what am i doing wrong?
var result = client.Search<CorvilTest>(s => s
.From(0)
.Size(10000)
.Query(x => x
.Term(e => e.eventName,"New"))
);
var r = result.Documents;
Mapping definition:
{
"data-2016-01-14": {
"mappings": {
"type-data": {
"properties": {
"#fields": {
"properties": {
"timestamp": {
"type": "string"
}
}
},
"#message": {
"properties": {
"OrderQty": {
"type": "long"
},
"Price": {
"type": "string"
},
"eventID": {
"type": "long"
},
"eventName": {
"type": "string"
},
"statistic_LatencyValue_ns": {
"type": "long"
},
"statistic_LatencyViolation": {
"type": "boolean"
},
"statistic_LossViolation": {
"type": "boolean"
},
"timestamp": {
"type": "long"
}
}
},
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
}
}
}
I see that the field #message.eventName is using a standard analyzer which means that its value is lower-cased and split at word boundaries before indexing. Hence the value "new" is indexed and not "New". Read more about it here. You need to be mindful about this fact when using a Term Query. Another thing is that the field eventName is not of nested type. So the code below should work for you.
var result = client.Search<CorvilTest>(s => s
.From(0)
.Size(10000)
.Query(x => x
.Term(e => e.Message.EventName, "new"))); // Notice I've used "new" and not "New"
var r = result.Documents;
For the above code to work the definition of CorvilTest class should be something like below:
public class CorvilTest
{
[ElasticProperty(Name = "#message")]
public Message Message { get; set; }
/* Other properties if any */
}
public class Message
{
[ElasticProperty(Name = "eventName")]
public string EventName { get; set; }
}

Categories