ElasticSearch Exact match on multiple fields - c#

I'm newbie of Elastic Search, I'm trying to get an exact match on every field of an object in elasticsearch index. For example I have two object:
{
"_index": "sql",
"_type": "_doc",
"_id": "mpovsH",
"_score": 1.0,
"_source": {
"entityId": 1,
"user": "userfirst",
"descr": "testfirst",
}
},
{
"_index": "sql",
"_type": "_doc",
"_id": "mpovsH",
"_score": 1.0,
"_source": {
"entityId": 2,
"user": "usersecond",
"descr": "testsecond",
}
}
I want the search the string "userfirst" on all fields of the object, and get only the first one as response. I tried:
var searchResponse = client.SearchAsync<MyObject>(s => s
.Source(sf => sf)
.Query(q => q
.MultiMatch(a => a
.Query(queryValue)))).Result;
Where queryValue is "userfirst" but I get both object in results. How can I change it? Also, I would not write every single field if possible to search, because my object is way more bigger.
EDIT: I managed to get only one results with this query:
var searchResponse = client.SearchAsync<TendersElasticSearch>(s => s
.Source(sf => sf)
.Query(qn => qn
.MatchPhrasePrefix(ma => ma
.Field(x => x.User)
.Query(queryValue)))).Result;
But with this query, I get results only on field user. I would like to search on all fields of every object. Any tips?

Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"users": {
"type": "nested"
}
}
}
}
Index Data:
{
"users": [
{
"entityId": 1,
"user": "userfirst",
"descr": "testfirst"
},
{
"entityId": 2,
"user": "usersecond",
"descr": "testsecond"
}
]
}
Search Query:
{
"query": {
"nested": {
"path": "users",
"query": {
"bool": {
"must": [
{ "match": { "users.user": "userfirst" }}
]
}
},
"inner_hits":{}
}
}
}
Search Query using Multi match:
{
"query": {
"nested": {
"path": "users",
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "userfirst",
"fields": [
"users.user"
]
}
}
]
}
},
"inner_hits": {}
}
}
}
Search Result:
hits": [
{
"_index": "stof_64061575",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "users",
"offset": 0
},
"_score": 0.6931471,
"_source": {
"entityId": 1,
"user": "userfirst",
"descr": "testfirst"
}
}
]

C# Query :
var searchResponse = client.SearchAsync<MyObject>(s => s
.Source(sf => sf
.Nested(n=>n.Path("users").
Query(qn=>qn.bool(
b=> b.Must(
m => m. .Query(q => q
.MultiMatch(a => a
.Query(queryValue))))))
)
).Result;

Related

ElasticSearch Nest, edge n gram with Fuziness

I am using ElastiSearch.Net and NEST v7.10.0
I have these settings and mappings for elastic search.
{
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"keyword_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"char_filter": [],
"type": "custom",
"tokenizer": "keyword"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 50,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": {
"properties": {
"MatchName": {
"type": "text",
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
}
},
"analyzer": "standard"
},
"CompetitionName": {
"type": "text",
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
}
},
"analyzer": "standard"
}
}
}
}
I have indexed 3 documents with values
{
"_source": {
"CompetitionName": "Premiership",
"MatchName": "Dundee Utd - St Johnstone",
}
},
{
"_source": {
"CompetitionName": "2nd Div, Vastra Gotaland UOF",
"MatchName": "IF Limhamn Bunkeflo - FC Rosengaard 1917",
}
},
{
"_source": {
"CompetitionName": "Bundesliga",
"MatchName": "Hertha Berlin - Eintracht Frankfurt",
}
}
And i am searching with Fuziness.Auto in both fields with string "bunde".
I want to achieve to get all the documents with the search above.
But for the query below i get nothing.
string value = "bunde";
BoolQuery boolQuery = new BoolQuery
{
Should = new List<QueryContainer>
{
new QueryContainer(new FuzzyQuery
{
Field = Infer.Field<EventHistoryDoc>(path:eventHistoryDoc => eventHistoryDoc.MatchName),
Value = value,
Fuzziness = Fuzziness.Auto,
}),
new QueryContainer(new FuzzyQuery
{
Field = Infer.Field<EventHistoryDoc>(path:eventHistoryDoc => eventHistoryDoc.CompetitionName),
Value = value,
Fuzziness = Fuzziness = Fuzziness.Auto,
})
}
};
ISearchRequest searchRequest = new SearchRequest
{
Query = new QueryContainer(boolQuery),
};
var json = _elasticClient.RequestResponseSerializer.SerializeToString(searchRequest);
ISearchResponse<EventHistoryDoc> searchResponse = await _elasticClient.SearchAsync<EventHistoryDoc>(searchRequest);
If i search with string "bundes" i get only one document
{
"_source": {
"CompetitionName": "Bundesliga",
"MatchName": "Hertha Berlin - Eintracht Frankfurt",
}
}
Any idea about changes should i do to settings, mapping or query in order to get as response all the documents above?
I am not aware of the syntax of Elasticsearch Nest, but in JSON format you can achieve your result in the following way:
Adding a working example with index mapping, search query, and search result
(For now, I have removed the keyword_analyzer and edge_ngram_search_analyzer from the index mapping, as you just wanted to return all the documents with edge ngram along with fuzziness)
Index Mapping:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 50,
"token_chars": [
"letter",
"digit"
]
}
}
},
"max_ngram_diff": 50
},
"mappings": {
"properties": {
"CompetitionName": {
"type": "text",
"analyzer": "my_analyzer"
},
"MatchName": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Search Query:
{
"query": {
"multi_match": {
"query": "bunde",
"fuzziness": "AUTO"
}
}
}
Search Result:
"hits": [
{
"_index": "64968421",
"_type": "_doc",
"_id": "1",
"_score": 2.483365,
"_source": {
"CompetitionName": "Premiership",
"MatchName": "Dundee Utd - St Johnstone"
}
},
{
"_index": "64968421",
"_type": "_doc",
"_id": "3",
"_score": 2.4444416,
"_source": {
"CompetitionName": "Bundesliga",
"MatchName": "Hertha Berlin - Eintracht Frankfurt"
}
},
{
"_index": "64968421",
"_type": "_doc",
"_id": "2",
"_score": 0.6104546,
"_source": {
"CompetitionName": "2nd Div, Vastra Gotaland UOF",
"MatchName": "IF Limhamn Bunkeflo - FC Rosengaard 1917"
}
}
]
The index mapping provided in the question is also correct. When using the same index mapping (as provided in the question) and searching for bunde in the multi-match query (as shown above), all the three documents are returned (which is the expected result).

How to distinct a field (which has several duplicates in search result) in ElasticSearch by C#?

I was trying to get/search results from an elasticsearch server. But i have only get permissions so i cannot put some mapping index into ES server or change any configurations in ElasticSarch.
.../testindex/doc/_search?q=country_code:99&size=5000&pretty
{
"took": 58,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.2876821,
"hits": [ // Above result numbers may not be correct!
{
"_index": "testindex",
"_type": "doc",
"_id": "3",
"_score": 0.2876821,
"_source": {
"name": "George",
"lastname": "Bush",
"owner_code": "555"
"country_code": "99"
}
},
{
"_index": "testindex",
"_type": "doc",
"_id": "2",
"_score": 0.2876821,
"_source": {
"name": "John",
"lastname": "Smith",
"owner_code": "444"
"country_code": "99"
}
},
{
"_index": "testindex",
"_type": "doc",
"_id": "1",
"_score": 0.28582606,
"_source": {
"name": "Mike",
"lastname": "Brown",
"owner_code": "555"
"country_code": "99"
}
}
]
}
}
I tried create new indices but i already failed because i don't have any permissions about that.
I also tried below code:
[JsonObject]
class CustomDTO
{
[JsonProperty("name")]
public string name { get; set; }
[JsonProperty("lastname")]
public string lastname { get; set; }
[JsonProperty("owner_code")]
public string owner_code { get; set; }
[JsonProperty("country_code")]
public string country_code { get; set; }
}
// ElasticSearch v6.x
var result = _client.Search<CustomDTO>(s => s
.Index("testindex")
.Type("doc")
.From(0)
.Size(5000)
.Aggregations(a => a
.Terms("my_terms_agg", t => t
.Field("owner_code.keyword")
.Size(2000)
)
)
.Query(q => q.QueryString(qs => qs
.Field(f => f.country_code)
.Query("99"))));
Expected result is: /* After distinct "owner_code". */
{
{
"name":"Mike",
"lastname":"Brown",
"owner_code":"555",
"country_code":"99"
},
{
"name":"John",
"lastname":"Smith",
"owner_code":"555",
"country_code":"99"
}
}
And it returns no bucket or documents (Count: [0]).
Pseude SQL code that i need in ElasticSearch too.
select * distinct "owner_code" from testindex where country_code = "99"
So i should get these result (Mike-Brown-555-99, John-Smith-444-99 if check above.). And ignore other results which has same "owner_code".It doesn't matter name and/or lastname fields are different. I want to distinct "owner_code" field and search for "countryCode" field. It is very easy to do in SQL Query but i am very confused in ElasticSearch (v6.x).
Thanks in advance!

With Elasticsearch .NET and NEST 6.x : How to MultiGet documents from multiples indices

var ids = new Dictionary<string, List<string>>();
ids["Topic"] = new List<string> {"KL7KJ2QBWD77yvpxyjvd", "374wJ2QBWD77yvpxDjpX", "B7499GMBWD77yvpxFzgW"};
ids["Prod"] = new List<string>();
ids["Account"] = new List<string>();
I made this fluent NEST query:
var results = client.MultiGet(m =>
m.Index("topics").GetMany<Topic>(ids["Topic"])
.Index("prods").GetMany<Prod>(ids["Prod"])
.Index("accounts").GetMany<Account>(ids["Account"])
it's generating the request below. We see the request use only the last index set which is "accounts" (and this is not what I need) :
http://localhost:9200/accounts/_mget?pretty=true
{
"docs": [
{
"_type": "Topic",
"_id": "KL7KJ2QBWD77yvpxyjvd"
},
{
"_type": "Topic",
"_id": "374wJ2QBWD77yvpxDjpX"
},
{
"_type": "Topic",
"_id": "B7499GMBWD77yvpxFzgW"
}
]
}
# Response:
{
"docs" : [
{
"_index" : "accounts",
"_type" : "Topic",
"_id" : "KL7KJ2QBWD77yvpxyjvd",
"found" : false
},
{
"_index" : "accounts",
"_type" : "Topic",
"_id" : "374wJ2QBWD77yvpxDjpX",
"found" : false
},
{
"_index" : "accounts",
"_type" : "Topic",
"_id" : "B7499GMBWD77yvpxFzgW",
"found" : false
}
]
}
In fact I would like to create in fluent NEST the following (valid) Elasticsearch query request (with no specific index) :
http://localhost:9200/_mget?pretty=true
{
"docs": [
{
"_index": "topics",
"_type": "Topic",
"_id": "KL7KJ2QBWD77yvpxyjvd"
},
{
"_index": "topics",
"_type": "Topic",
"_id": "374wJ2QBWD77yvpxDjpX"
},
{
"_index": "topics",
"_type": "Topic",
"_id": "B7499GMBWD77yvpxFzgW"
}
]
}
Is there a way, in fluent NEST, to specify each index/type for each list of ids ?
And of course if I'm adding specific ids to ids["Prod"] and ids["Account"] this will be generated properly for those... for instance:
http://localhost:9200/_mget?pretty=true
{
"docs": [
{
"_index": "topics",
"_type": "Topic",
"_id": "KL7KJ2QBWD77yvpxyjvd"
},
{
"_index": "prods",
"_type": "Prod",
"_id": "xxxxx"
},
{
"_index": "accounts",
"_type": "Account",
"_id": "yyyyy"
}
]
}
Each GetMany<T>(...) takes a second argument that is a delegate to further describe the call, including supplying an index name
var ids = new Dictionary<string, List<string>>
{
{ "Topic", new List<string> { "topic1", "topic2", "topic3" } },
{ "Prod", new List<string> { "prod1", "prod2", "prod3" } },
{ "Account", new List<string> { "account1", "account2", "account3" } }
};
var multiGetResponse = client.MultiGet(m => m
.GetMany<Topic>(ids["Topic"], (op, id) => op
.Index("topics")
)
.GetMany<Prod>(ids["Prod"], (op, id) => op
.Index("prods")
)
.GetMany<Account>(ids["Account"], (op, id) => op
.Index("accounts")
)
);
which results in a request like
POST http://localhost:9200/_mget
{
"docs": [
{
"_index": "topics",
"_type": "topic",
"_id": "topic1"
},
{
"_index": "topics",
"_type": "topic",
"_id": "topic2"
},
{
"_index": "topics",
"_type": "topic",
"_id": "topic3"
},
{
"_index": "prods",
"_type": "prod",
"_id": "prod1"
},
{
"_index": "prods",
"_type": "prod",
"_id": "prod2"
},
{
"_index": "prods",
"_type": "prod",
"_id": "prod3"
},
{
"_index": "accounts",
"_type": "account",
"_id": "account1"
},
{
"_index": "accounts",
"_type": "account",
"_id": "account2"
},
{
"_index": "accounts",
"_type": "account",
"_id": "account3"
}
]
}
The "_type" names have been inferred from the CLR POCO names by lowercasing them, this can be changed by overriding the type name inferrer on Connection Settings.

Elastic .Nest Equivelent of field_value_factor

I need to revise a method that builds a SearchDescriptor using .Nest so that the score is higher for
product search results for items having a contract price (value of zero).
I captured the serialized version of the query added "field_value_factor" to return the results in the desired order. I have not determined how to achieve this in the .Nest query statement.
Can someone recommend how to revise the .NEST client statements to produce the same search descriptor?
Thank you
Below is the query we want to achieve where you will see field_value_factor at the bottom:
{
"from": 0,
"size": 3000,
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"priceStatus": {
"order": "asc"
}
},
{
"unitPrice": {
"order": "asc"
}
}
],
"aggs": {
"PriceStatus": {
"terms": {
"field": "priceStatus",
"size": 5
}
},
"VendorName": {
"terms": {
"field": "vendorName",
"size": 5
}
},
"CatalogName": {
"terms": {
"field": "catalogName",
"size": 5
}
},
"ManufacturerName": {
"terms": {
"field": "manufacturerName",
"size": 5
}
},
"IsGreen": {
"terms": {
"field": "isGreen",
"size": 5
}
},
"IsValuePack": {
"terms": {
"field": "isValuePack",
"size": 5
}
},
"UnitOfMeasure": {
"terms": {
"field": "unitOfMeasure",
"size": 5
}
},
"Attributes": {
"nested": {
"path": "attributes"
},
"aggs": {
"TheAttributeName": {
"terms": {
"field": "attributes.name",
"size": 10
},
"aggs": {
"TheAttributeValue": {
"terms": {
"field": "attributes.value",
"size": 5
}
}
}
}
}
}
},
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"multi_match": {
"type": "phrase",
"query": "pen",
"slop": 3,
"boost": 16.0,
"fields": [
"itemNumber*^4",
"shortDescription*^4",
"subCategory1Name*^1.5",
"subCategory2Name*^2.0",
"categoryName*^0.9",
"longDescription*^0.6",
"catalogName*^0.30",
"manufactureName*^0.20",
"vendorName*^0.15",
"upcCode*^0.10"
]
}
},
{
"multi_match": {
"query": "pen",
"boost": 15.0,
"minimum_should_match": "75%",
"fields": [
"itemNumber*^4",
"shortDescription*^4",
"subCategory1Name*^1.5",
"subCategory2Name*^2.0",
"categoryName*^0.9",
"longDescription*^0.6",
"catalogName*^0.30",
"manufactureName*^0.20",
"vendorName*^0.15",
"upcCode*^0.10"
]
}
},
{
"multi_match": {
"query": "pen",
"fuzziness": 1.0,
"slop": 2,
"minimum_should_match": "75%",
"fields": [
"itemNumber*^4",
"shortDescription*^4",
"subCategory1Name*^1.5",
"subCategory2Name*^2.0",
"categoryName*^0.9",
"longDescription*^0.6",
"catalogName*^0.30",
"manufactureName*^0.20",
"vendorName*^0.15",
"upcCode*^0.10"
]
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"catalogId": [
"fbb3dd2c-f81c-4ff3-bd5b-9c2cffc51540"
]
}
}
]
}
},
"field_value_factor": {
"field": "priceStatus",
"factor": -1,
"modifier": "none"
}
}
}
}
Below is the current method that builds the SearchDescriptor:
private SearchDescriptor<SearchItem> BuildSearchDescriptor(
string searchTerm,
IList<Guid> catalogIds,
int from,
int size,
string index,
string preference,
int attrSize,
int valueSize,
Dictionary<string, string[]> filterProps,
Dictionary<string, string[]> filterAttrs,
Guid? categoryId)
{
var searchDescriptor = new SearchDescriptor<SearchItem>()
.From(from)
.Size(size)
.Query(q =>
q.Filtered(fd => BuildFilterTerms(fd, filterProps, filterAttrs, catalogIds, categoryId)
.Query(iq => BuildQueryContainer(iq, searchTerm))
)
)
.Index(index)
.Preference(preference)
.Aggregations(agg => BuildAggregationDescriptor(agg, attrSize, valueSize, catalogIds.Count))
.Sort(sort => sort.OnField("_score").Descending())
.SortAscending(p=> p.PriceStatus)
.SortAscending(p => p.UnitPrice);
// Debug the raw string that will post to the ES servers i.e. use this in postman
//var str = System.Text.Encoding.UTF8.GetString(client.Serializer.Serialize(searchDescriptor));
return searchDescriptor;
}
Your JSON query isn't valid; field_value_factor is a function of a function_score query. In NEST 1.x, this would look like
var response = client.Search<Document>(x => x
.Query(q => q
.FunctionScore(fs => fs
.Functions(fu => fu
.FieldValueFactor(fvf => fvf
.Field(f => f.PriceStatus)
.Factor(-1)
.Modifier(FieldValueFactorModifier.None)
)
)
)
)
);
public class Document
{
public string Title { get; set; }
public int PriceStatus { get; set; }
}
which produces the query
{
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "PriceStatus",
"factor": -1.0,
"modifier": "none"
}
}
]
}
}
}

Elasticsearch NEST 5.x Koelnerphonetic not matching

UPDATE
I changed the approach of the question
I'm trying to apply phonetic search with koelner phonetics and also ngram is used.
Index configuration I'm using:
{
"testnew": {
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "testnew",
"creation_date": "1489672932033",
"analysis": {
"filter": {
"koelnerPhonetik": {
"replace": "false",
"type": "phonetic",
"encoder": "koelnerphonetik"
}
},
"analyzer": {
"koelnerPhonetik": {
"type": "custom",
"tokenizer": "koelnerPhonetik"
},
"ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer"
}
},
"tokenizer": {
"koelnerPhonetik": {
"type": "standard"
},
"ngram_tokenizer": {
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
],
"min_gram": "2",
"type": "ngram",
"max_gram": "20"
}
}
},
...
}
}
}
}
}
Ive got one document that looks like this:
{
"_index": "testnew",
"_type": "person",
"_id": "3",
"_score": 1,
"_source": {
"name": "Can",
"fields": {
"phonetic": {
"type": "string",
"analyzer": "koelnerPhonetik"
}
}
}
It is mapped like this:
GET testnew/person/_mapping
"name": {
"type": "text",
"analyzer": "koelnerPhonetik"
}
Why cant I find 'Can' by searching for 'Kan' in this query?
GET testnew/person/_search
{
"query": {
"match": {
"name.phonetic": {
"query": "Kan"
}
}
}
}

Categories