Must match one field and Can contain other fields in Elastic Search - c#

I need to figure out on how to search items from 2 fields. We have bunch of properties and if someone selects a property to search from I will be sending that as a parameter. I tried to create those queries in kibana but couldn't combine them. Here is the sample of what I need to combine but now sure how to.
This will need to match the site field
{
"query": {
"match": {
"site": {
"query": "Some name here",
"type": "phrase"
}
}
}
}
This will try to look in multiple fields (content, description, title, etc)
{
"query": {
"match": {
"content": {
"query": "diode",
"type": "phrase"
}
}
}
}
How would I combine both those queries (I am using NEST to query). Any ideas?

Related

Elastic - updating multiple documents in a single request

I need to update several thousand items every several minutes in Elastic and unfortunately reindexing is not an option for me. From my research the best way to update an item is using _update_by_query - I have had success updating single documents like so -
{
"query": {
"match": {
"itemId": {
"query": "1"
}
}
},
"script": {
"source": "ctx._source.field = params.updateValue",
"lang": "painless",
"params": {
"updateValue": "test",
}
}
}
var response = await Client.UpdateByQueryAsync<dynamic>(q => q
.Index("masterproducts")
.Query(q => x.MatchQuery)
.Script(s => s.Source(x.Script).Lang("painless").Params(x.Params))
.Conflicts(Elasticsearch.Net.Conflicts.Proceed)
);
Although this works it is extremely inefficient as it generates thousands of requests - is there a way in which I can update multiple documents with a matching ID in a single request? I have already tried Multiple search API which it would seem cannot be used for this purpose. Any help would be appreciated!
If possible, try to generalize your query.
Instead of targeting a single itemId, perhaps try using a terms query:
{
"query": {
"terms": {
"itemId": [
"1", "2", ...
]
}
},
"script": {
...
}
}
From the looks of it, your (seemingly simplified) script sets the same value, irregardless of the document ID / itemId. So that's that.
If the script does indeed set different values based on the doc IDs / itemIds, you could make the params multi-value:
"params": {
"updateValue1": "test1",
"updateValue2": "test2",
...
}
and then dynamically access them:
...
def value_to_set = params['updateValue' + ctx._source['itemId']];
...
so the target doc is updated with the corresponding value.

Elasticsearch proper way to escape spaces, ? doesn't work in all scenarios

I'm trying to get searching with spaces to work properly in elasticsearch but having a ton of trouble getting it to behave the same way as it does on another field.
I have two fields, Name and Addresses.First().Line1 that I want to be able to search and preserve spaces in the search. For instance, searching for Bob Smi* would return Bob Smith but not just Bob.
This is working for my Name field by doing a query string search with the space replaced with ?. I'm also doing a wildcard so my final query is *bob?smi*.
However, when I try to also search by line1, I get no results. E.g. *4800* returns a record with line1 like 4800 Street, but when I do the same transformation with 4800 street to get *4800?street*, I get no results.
Below is my query
{
"from": 0,
"size": 50,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "*4800?Street*",
"fields": [
"name",
"addresses.line1"
]
}
}
]
}
}
}
returns no result.
Why would *bob?smi* return result with name Bob Smith but *4800?street* not return result with line item 4800 street?
Below is how both fields are set up in the index:
.Text(smd => smd.Name(c => c.Name).Analyzer(ElasticIndexCreator.SortAnalyzer).Fielddata())
.Nested<Address>(nomd => nomd.Name(p => p.PrimaryAddress).Properties(MapAddressProperties))
//from MapAddressProperties()
.Text(smd2 => smd2.Name(x => x.Line1).Analyzer(ElasticIndexCreator.SortAnalyzer).Fielddata())
Mappings in elastic:
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
"addresses": {
"line1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
}
Is there some other, better way to escape a space in an elasticsearch querystring? I've also tried \\ and \\\\ (in C# evaluates to \\) instead of the ? to no avail.
Finally found the correct setup after tons of time experimenting. The configuration that worked for me was as follows:
Use Text with Field Data in the columns
Search using QueryString with wildcard placeholders, replacing spaces with ? e.g. bob smith is entered, query elastic with *bob?smith*
Use Nested queries for child objects. Oddly, addresses.line1 will return data when searching for say 4800 but not when trying to do *4800?street*. Using a nested query allows this to function properly .
From what I hear, having to use field data is very memory intensive, and having to use wildcards is very time intensive, so this is probably not an optimal solution but it's the only one I've found. If there is another better way to do this, please enlighten me.
Example queries in C# using Nest:
var query = Query<Student>.QueryString(qs => qs
.Fields(f => f
.Field(c => c.Name)
//.Field(c => c.PrimaryAddress.Line1) //this doesn't work
)
.Query(testCase.Term)
);
query |= Query<Student>.Nested(n => n
.Path(p => p.Addresses)
.Query(q => q
.QueryString(qs => qs
.Fields(f => f.Field(c => c.Addresses.First().Line1))
.Query(testCase.Term)
)
)
);
Example Mapping:
.Map<Student>(s => s.Properties(p => p
.Text(t => t.Name(z => z.Name).Fielddata())
.Nested<StudentAddress>(n => n
.Name(ap => ap.Addresses)
.Properties(ap => ap.Text(t => t.Name(z => z.Line1).Fielddata())
)
))
Try using addresses.line1.keyword (that is, try the keyword multi-field that you defined for addresses.line1) in the fields parameter a term-level wildcard query:
{
"query": {
"wildcard": {
"addresses.line1.keyword": {
"wildcard": "*4800 street*"
}
}
}
}
Per Elasticsearch documentation on full-text wildcard searches, if you search against addresses.line1 (whose type is text so full-text search rules apply), the search will be performed against each term analyzed out of the field, that is, once against 4800 and again against street, none of which would match your *4800?street* wildcard. The addresses.line1.keyword multi-field contains the original 4800 street value, and should match your query pattern using a term-level wildcard query.
By the way, a minor nit: the mapping type definition itself seems incomplete for the addresses field. You said it is:
"addresses": {
"line1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
}
But IMHO it should instead be:
"addresses": {
"properties": {
"line1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
}
}

How to merge the results of two queries to different indices in Elasticsearch?

I'm searching an index main-kittens for docs of type Kitty. Now, I want to run an experiment. For some of the users, I want to search experiment-kittens instead. The type is the same — Kitty, and all the fields has the same value as in main index, but while the field Bio is always empty in the main index, in experimental one it stores huge strings.
Now, the problem is that I can't store that Bio for all kittens due to memory/disk limitations. So the experiment-kittens has only most recent kittens (say, last month).
I want the search to be left intact for the most users (i.e., always use the main index). For the picked ones, I want to merge the results. The logic should be:
search userquery + date_created < 1 month ago in experiment-kittens
search userquery + date_created > 1 month ago in main-kittens
The results should be sorted by create_date, and there are too many of them to sort them in my app.
Is there a way to ask elastic to execute two different queries on two indices and merge the results?
(I'm also sure there could be more optimal solutions to the problem, please tell me if you have some).
You can search across multiple indices with a single Elasticsearch request by separating the index names with a comma. Then you can use the missing filter to differentiate between the two indices (one having Bio field and the other not). Then you can use the range filter to filter based on the value of date_created field. Finally you can use the sort API to sort based on the values of date_created field.
Putting all of these together, the Elasticsearch query that you need is as under:
POST main-kittens,experiment-kittens/Kitty/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"missing": {
"field": "Bio"
}
},
{
"range": {
"date_created": {
"to": "now-1M"
}
}
}
]
}
},
{
"bool": {
"must_not": [
{
"missing": {
"field": "Bio"
}
}
],
"must": [
{
"range": {
"date_created": {
"from": "now-1M"
}
}
}
]
}
}
]
}
}
}
},
"sort": [
{
"date_created": {
"order": "desc"
}
}
]
}
You can replace "match_all": {} with any custom query that you may have.

Nest - ElasticSearch.Net - How do I split the FacetFilter hit count returned by a search property

I have a NEST Query;
var desciptor = new SearchDescriptor<SomePoco>()
.TrackScores()
.From(request.Page == 1 ? 0 : (request.Page - 1) * request.PageSize)
.Size(request.PageSize)
.MatchAll()
.FacetFilter("some_name", a => new FilterContainer(new AndFilter { Filters = CreatePocoSearchFilter(request) }))
.SortDescending("_score");
var results = _client.Search<SomePoco>(x => descriptor);
The FacetFilter is returning the total number of HITS from my query. I want to split these hits out using a property on the search request. So, in the search request I have a list of ints. I want to know how many hits were returned for each int in that list.
I hope this makes sense.
I've tried adding a FacetTerm, this gives me the total number of hits for every value of the int query value instead of just the ones that pertain to the search. I understand the query, filter stage, and have tried to change the descriptor accordingly with no luck.
Thanks.
There are several ways to do this. My suggestion would be to use a filtered query, and then use a Terms aggregation or facet (facets are deprecated so I recommend moving away from those) on the results.
With an Aggregation:
POST /_search
{
"query": {
"filtered": {
"query": { "match_all": {}},
"filter": {
"terms": {
"<FIELD_NAME>": [1, 2, 3, 42]
}
}
}
},
"aggs": {
"countOfInts": {
"terms": {
"field": "<FIELD_NAME>",
"size": 10
}
}
}
}
With a Facet:
POST /_search
{
"query": {
"filtered": {
"query": { "match_all": {}},
"filter": {
"terms": {
"<FIELD_NAME>": [1, 2, 3, 42]
}
}
}
},
"facets": {
"countOfInts": {
"terms": {
"field": "<FIELD_NAME>",
"size": 10
}
}
}
}
You could also do the same thing by doing a plain query with match_all and then do the filter inside the facet or aggregation. The way I listed it above will perform a little bit better because it will reduce the working set before building the agg/facet.
I did not include the code for NEST because depending on the version of the dlls you are using the format can be somewhat different.

Get entity by its parent entity's value in Elasticsearch

Imagine we have this simple entity in ES
user {
username: "ultrauser"
name: "Greg"
address: [
{ city: "London" }, {city: "Prague" }
]
}
I need a query, which will return me all addreses for user "ultrauser".
I'm using NEST and for now, I'm doing it the way I select user where username="ultrauser" and then read only the addresses field by myself. So the ES will return me whole user entity, which contain addresses too.
But is it possible for ES to make query like "Give me all addresses, which belongs to user 'ultrauser'"?
I need ES to return list of addresses entities, not user entity, containing addresses. It is simple when you go from root to leafs (get user.address.city), but how can I select from leafs to root easily?
Important is, we can not use parent-child or nested document features because of other reasons.
Thanks for all you ideas.
You should probably read this article: http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ You're trying to apply RDBMS concepts to elasticsearch and that is usually a bad idea. Really, even if you are storing objects, they are still stored flat in elasticsearch behind the scenes.
I think this query will get you to where you want to be though, if I'm understanding you correctly:
{
"query": {
"bool": {
"must": [
{
"term": {
"username": "some matched item"
}
},
{
"filtered": {
"filter": {
"exists": { "field" : "address" }
}
}
}
]
}
},
"fields": [
"address"
]
}
Does it matter if you extract the addresses or if you ask elasticsearch to do it for you? Sometimes you don't want to send all that data over the wire if not needed and that might be your reason.
This will still return something like this:
hits: [
{
_index: indexname
_type: typename
_id: id
_score: 1.4142135
fields: {
address: [
{someaddress_object}
]
}
}, ...
So you will still need to loop through the results anyway when you get them back, just the result size is smaller.

Categories