Get entity by its parent entity's value in Elasticsearch - c#

Imagine we have this simple entity in ES
user {
username: "ultrauser"
name: "Greg"
address: [
{ city: "London" }, {city: "Prague" }
]
}
I need a query, which will return me all addreses for user "ultrauser".
I'm using NEST and for now, I'm doing it the way I select user where username="ultrauser" and then read only the addresses field by myself. So the ES will return me whole user entity, which contain addresses too.
But is it possible for ES to make query like "Give me all addresses, which belongs to user 'ultrauser'"?
I need ES to return list of addresses entities, not user entity, containing addresses. It is simple when you go from root to leafs (get user.address.city), but how can I select from leafs to root easily?
Important is, we can not use parent-child or nested document features because of other reasons.
Thanks for all you ideas.

You should probably read this article: http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ You're trying to apply RDBMS concepts to elasticsearch and that is usually a bad idea. Really, even if you are storing objects, they are still stored flat in elasticsearch behind the scenes.
I think this query will get you to where you want to be though, if I'm understanding you correctly:
{
"query": {
"bool": {
"must": [
{
"term": {
"username": "some matched item"
}
},
{
"filtered": {
"filter": {
"exists": { "field" : "address" }
}
}
}
]
}
},
"fields": [
"address"
]
}
Does it matter if you extract the addresses or if you ask elasticsearch to do it for you? Sometimes you don't want to send all that data over the wire if not needed and that might be your reason.
This will still return something like this:
hits: [
{
_index: indexname
_type: typename
_id: id
_score: 1.4142135
fields: {
address: [
{someaddress_object}
]
}
}, ...
So you will still need to loop through the results anyway when you get them back, just the result size is smaller.

Related

Elastic - updating multiple documents in a single request

I need to update several thousand items every several minutes in Elastic and unfortunately reindexing is not an option for me. From my research the best way to update an item is using _update_by_query - I have had success updating single documents like so -
{
"query": {
"match": {
"itemId": {
"query": "1"
}
}
},
"script": {
"source": "ctx._source.field = params.updateValue",
"lang": "painless",
"params": {
"updateValue": "test",
}
}
}
var response = await Client.UpdateByQueryAsync<dynamic>(q => q
.Index("masterproducts")
.Query(q => x.MatchQuery)
.Script(s => s.Source(x.Script).Lang("painless").Params(x.Params))
.Conflicts(Elasticsearch.Net.Conflicts.Proceed)
);
Although this works it is extremely inefficient as it generates thousands of requests - is there a way in which I can update multiple documents with a matching ID in a single request? I have already tried Multiple search API which it would seem cannot be used for this purpose. Any help would be appreciated!
If possible, try to generalize your query.
Instead of targeting a single itemId, perhaps try using a terms query:
{
"query": {
"terms": {
"itemId": [
"1", "2", ...
]
}
},
"script": {
...
}
}
From the looks of it, your (seemingly simplified) script sets the same value, irregardless of the document ID / itemId. So that's that.
If the script does indeed set different values based on the doc IDs / itemIds, you could make the params multi-value:
"params": {
"updateValue1": "test1",
"updateValue2": "test2",
...
}
and then dynamically access them:
...
def value_to_set = params['updateValue' + ctx._source['itemId']];
...
so the target doc is updated with the corresponding value.

C# MongoDB Driver: Can't find the way to run complex query for AnyIn filter in MongoDB

I have a document like this:
{
"id": "xxxxxxxxxxxx",
"groupsAuthorized": [
"USA/California/SF",
"France/IDF/Paris"
]
}
And I have an user that has a list of authorized groups, like for example the following:
"groups": [
"France/IDF",
"USA/NY/NYC"
]
What I'm trying to achieve is to retrieve all documents in the database that the user is authorized to retrieve, essentially I want to be able to check in the list "groupsAuthorized" if one of the group contains a subset of an element of the other list "groups" contained in my user authorizations
using the following values:
my document:
{
"id": "xxxxxxxxxxxx",
"groupsAuthorized": [
"USA/California/SF",
"France/IDF/Paris"
]
}
my user permissions:
"groups": [
"France/IDF",
"USA/NY/NYC"
]
the user should be able to retrieve this document as the string "France/IDF" is correctly contained in the string "France/IDF/Paris", however, if the values would've been like this:
my document:
{
"id": "xxxxxxxxxxxx",
"groupsAuthorized": [
"USA/California/SF",
"France/IDF"
]
}
my user permissions:
"groups": [
"France/IDF/Paris",
"USA/NY/NYC"
]
it should not work, because my user is only authorized to view documents from France/IDF/Paris and USA/NY/NYC and none of the string inside of the authorizedGroups of my document contains those sequences
I've tried to use a standard LINQ query to achieve this which is fairly simple:
var userAuthorizedGroups = new List<string> { "France/IDF/Paris", "USA/NY/NYC" };
var results = collection.AsQueryable()
.Where(entity => userAuthorizedGroups
.Any(userGroup => entity.authorizedGroups
.Any(entityAuthorizedGroup => entityAuthorizedGroup.Contains(userGroup))));
But i'm getting the famous unsupported filter exception that it seems lot of people is having, i've tried different options found on the internet like the following:
var userAuthorizedGroups = new List<string> { "France/IDF/Paris", "USA/NY/NYC" };
var filter = Builders<PartitionedEntity<Passport>>.Filter.AnyIn(i => i.authorizedGroups, userAuthorizedGroups);
var results = (await collection.FindAsync(filter)).ToList();
return results;
But the problem is this will only check if one of the element of the array is contained inside the other array, It will not correctly work for case like "France/IDF" that should correctly match "France/IDF/Paris" because "France/IDF" string is contained inside the "France/IDF/Paris" string inside of my document
I'm getting a bit clueless on how to achieve this using the mongodb C# driver, i'm starting to think that I should just pull all documents to client and do the filtering manually but that would be quite messy
Has anyone an Idea on this subject ?
i'm starting to think that I should just pull all documents to client and do the filtering manually but that would be quite messy
don't do it :)
One place you can start with is here. It describes all the LINQ operators that are supported by the MongoDB .NET driver. As you can see .Contains() isn't mentioned there which means you can't use it and you'll get an arror in the runtime but it does not mean that there's no way to do what you're trying to achieve.
The operator closest to contains you can use is $indexOfBytes which returns -1 if there's no match and the position of a substring otherwise. Also since you need to match an array against another array you need two pairs of $map and $anyElementTrue to do exactly what .NET's .Any does.
Your query (MongoDB client) can look like this:
db.collection.find({
$expr: {
$anyElementTrue: {
$map: {
input: "$groupsAuthorized",
as: "group",
in: {
$anyElementTrue: {
$map: {
input: ["France/IDF/Paris", "USA/NY/NYC"],
as: "userGroup",
in: { $ne: [ -1, { $indexOfBytes: [ "$$userGroup", "$$group" ] } ] }
}
}
}
}
}
}
})
Mongo Playground,
You can run the same query from .NET using BsonDocument class which takes a string (JSON) and converts into a query:
var query = BsonDocument.Parse(#"{
$expr: {
$anyElementTrue:
{
$map:
{
input: '$groupsAuthorized',
as: 'group',
in: {
$anyElementTrue:
{
$map:
{
input: ['France/IDF/Paris', 'USA/NY/NYC'],
as: 'userGroup',
in: { $ne: [-1, { $indexOfBytes: ['$$userGroup', '$$group'] } ] }
}
}
}
}
}
}
}");
var result = col.Find(query).ToList();

Must match one field and Can contain other fields in Elastic Search

I need to figure out on how to search items from 2 fields. We have bunch of properties and if someone selects a property to search from I will be sending that as a parameter. I tried to create those queries in kibana but couldn't combine them. Here is the sample of what I need to combine but now sure how to.
This will need to match the site field
{
"query": {
"match": {
"site": {
"query": "Some name here",
"type": "phrase"
}
}
}
}
This will try to look in multiple fields (content, description, title, etc)
{
"query": {
"match": {
"content": {
"query": "diode",
"type": "phrase"
}
}
}
}
How would I combine both those queries (I am using NEST to query). Any ideas?

How to merge the results of two queries to different indices in Elasticsearch?

I'm searching an index main-kittens for docs of type Kitty. Now, I want to run an experiment. For some of the users, I want to search experiment-kittens instead. The type is the same — Kitty, and all the fields has the same value as in main index, but while the field Bio is always empty in the main index, in experimental one it stores huge strings.
Now, the problem is that I can't store that Bio for all kittens due to memory/disk limitations. So the experiment-kittens has only most recent kittens (say, last month).
I want the search to be left intact for the most users (i.e., always use the main index). For the picked ones, I want to merge the results. The logic should be:
search userquery + date_created < 1 month ago in experiment-kittens
search userquery + date_created > 1 month ago in main-kittens
The results should be sorted by create_date, and there are too many of them to sort them in my app.
Is there a way to ask elastic to execute two different queries on two indices and merge the results?
(I'm also sure there could be more optimal solutions to the problem, please tell me if you have some).
You can search across multiple indices with a single Elasticsearch request by separating the index names with a comma. Then you can use the missing filter to differentiate between the two indices (one having Bio field and the other not). Then you can use the range filter to filter based on the value of date_created field. Finally you can use the sort API to sort based on the values of date_created field.
Putting all of these together, the Elasticsearch query that you need is as under:
POST main-kittens,experiment-kittens/Kitty/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"missing": {
"field": "Bio"
}
},
{
"range": {
"date_created": {
"to": "now-1M"
}
}
}
]
}
},
{
"bool": {
"must_not": [
{
"missing": {
"field": "Bio"
}
}
],
"must": [
{
"range": {
"date_created": {
"from": "now-1M"
}
}
}
]
}
}
]
}
}
}
},
"sort": [
{
"date_created": {
"order": "desc"
}
}
]
}
You can replace "match_all": {} with any custom query that you may have.

Couchbase View equivalent of Select count(doc.id) from doc where doc.productId in (select productId from doc where doc.lastupdated between x and y)

I'm trying to count the number of 'comments' related to a product in a couchbase bucket. That part is easy for a "full" set of data. It's just a simple map / reduce. Things get tricky when i want limit it to only products that have had changes within a date range. I can do this as two different Views in CB. One that gets the Product Id's where the dateCreated falls within my range, and then One that I pass these Id's to and it calculates my stats. The performance on this approach is horrible though. The key's for the second query aren't necessarily contiguous so i can't do a start/end on them; I'm using the .net 2.2 client for version 4.x couchbase.
I'm open to any options; i.e. Super-awesome-do-it-all-in-one-call View, or follow the 2 view approach if the client has some capacity for bulk get's against non-contiguous keys in a View (i can't find anything on this topic).
Here's my simplified example schema:
{
"comment": {
"key": "key1",
"title": "yay",
"productId": "product1",
"dateCreated": "2016,11,30"
},
"comment": {
"key": "key2",
"title": "booo",
"productId": "product1",
"dateCreated": "2016,12,30"
}
}
Not sure if this is what you want (also not sure about how to translate this to C#), but say you have two documents with ids comment::1 and comment::2 and a Couchbase document for each in this format.
{
"key": "key2",
"title": "booo",
"productId": "product1",
"dateCreated": "2016,12,30"
}
You can define a view (let's call it comments_by_time)
Map
function (doc, meta) {
if (doc.dateCreated) {
var dateParts = doc.dateCreated.split(",");
dateParts = dateParts.map(Number);
emit(dateParts, doc.productId);
}
}
Reduce
_count
Then, you can use the View Query API to do a startKey and endKey range on your documents.
End point
http://<couchbase>:8092/<bucket>/_design/<view>/_view/comments_by_time
Get count of all comments
?reduce=true
{"rows":[ {"key":null,"value":2} ] }
Get documents before a date
?reduce=false&endkey=[2016,12,1]
{"total_rows":2,"rows":[
{"id":"comment::1","key":[2016,11,30],"value":"product1"}
]
}
Between dates
?reduce=false&startkey=[2016,12,1]&endkey=[2017,1,1]
{"total_rows":2,"rows":[
{"id":"comment::2","key":[2016,12,30],"value":"product1"}
]
}

Categories