elasticsearch c# nest advanced search

elasticsearch c# nest advanced search - c#

I have a problem about querying a complex data type with c# nest api in elasticsearch. My model in elasticsearch is like this:
"hits": [
{
"_index": "post",
"_type": "postmodel",
"_source": {
"projectId": "2",
"language": "en",
"postDate": "2017-06-11T08:39:32Z",
"profiles": [
{
"label": "Emotional",
"confidence": 1
}
]
}
},
{
"_index": "post",
"_type": "postmodel",
"_source": {
"projectId": "3",
"language": "en",
"postDate": "2017-06-11T08:05:01Z",
"profiles": [
{
"label": "Fact oriented",
"confidence": 0.69
},
{
"label": "Rational",
"confidence": 1
}
]
}
},
...
By using c# Nest API, i want to fetch the postmodels which is projectId=3 and with "Rational" profile. My current code looks like this:
var postModels = await _elasticClient.SearchAsync<PostModel>(s => s
.Index("post")
.Query(q =>
{
QueryContainer query = new QueryContainer();
query = query && q.Match(m => m.Field(f => f.ProjectId)
.Query("3"));
return query;
}));
But i dont know how to query "Profiles". i want to extend my query to fetch specific profiles as well. I would be happy if someone can help me with this problem. Thank you in advance.

Related

Nest Elasticsearch match_phrase query throws parsing exception

I am using Nest elasticSearch as a client library to interact with the elasticSearch indices.
I am trying to send match_phrase query using the following code:
var searchResponse = elasticClient.Search<ProductType>(s => s
.Index(indices)
.Type(Types.Type(typeof(ProductType)))
.From(0)
.Size(5)
.Query(q =>
q.MatchPhrase(m => m
.Field(Infer.Field<ProductType>(ff => ff.Title))
.Slop(5)
.Query("my query")
)
)
);
It's generating the following query :
GET /product/_search
{
"from": 0,
"size": 5,
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"query": {
"match": {
"title": {
"type": "phrase",
"query": "my query",
"slop": 5
}
}
}
}
When I execute the above query it returns parsing_exception:
[match] query does not support [type]
I was expecting the above code to return query like the following:
GET /product/_search
{
"from": 0,
"size": 5,
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"query": {
"match_phrase": {
"title": {
"query": "my query",
"slop": 5
}
}
}
}
So is there anything wrong with my code and how can I get rid of it?

After investigation and depending on match query does not support type I have found that the server I am hosting my cluster on upgraded ElasticSearch into V6.5.0 and it turns out I should upgrade my Nest NuGet package and now it's generating the match_phrase query as expected.

ElasticSearch search getting bad results

I am fairly new to ElasticSearch and am having issues getting search results that I perceive to be good. My objective is to be able to search an index of medications (6 fields) against a phrase that the user enters. It could be one ore more words. I've tried a few approaches, but I'll outline the best one I've found so far below. Let me know what I'm doing wrong. I'm guessing that I'm missing something fundamental.
Here is a subset of the fields that I'm working with
...
"hits": [
{
"_index": "indexus2",
"_type": "Medication",
"_id": "17471",
"_score": 8.829264,
"_source": {
"SearchContents": " chew chewable oral po tylenol",
"MedShortDesc": "Tylenol PO Chew",
"MedLongDesc": "Tylenol Oral Chewable"
"GenericDesc": "ACETAMINOPHEN ORAL"
...
}
}
...
The fields that I'm searching against used an Edge NGram Analyzer. I'm using the C# Nest library for the indexing
settings.Analysis.Tokenizers.Add("edgeNGram", new EdgeNGramTokenizer()
{
MaxGram = 50,
MinGram = 2,
TokenChars = new List<string>() { "letter", "digit" }
});
settings.Analysis.Analyzers.Add("edgeNGramAnalyzer", new CustomAnalyzer()
{
Filter = new string[] { "lowercase" },
Tokenizer = "edgeNGram"
});
I am using a more_like_this query against the fields in question
GET indexus2/Medication/_search
{
"query": {
"more_like_this" : {
"fields" : ["MedShortDesc",
"MedLongDesc",
"GenericDesc",
"SearchContents"],
"like_text" : "vicodin",
"min_term_freq" : 1,
"max_query_terms" : 25,
"min_word_len": 2
}
}
}
The problem is that for this search for 'vicodin', I'd expect to see matches with the full work first, but I don't. Here is a subset of the results from this query. Vicodin doesn't show up until the 7th result
"hits": [
{
"_index": "indexus2",
"_type": "Medication",
"_id": "31192",
"_score": 4.567309,
"_source": {
"SearchContents": " oral po victrelis",
"MedShortDesc": "Victrelis PO",
"MedLongDesc": "Victrelis Oral",
"RepresentativeRoutedGenericDesc": "BOCEPREVIR ORAL",
...
}
}
<5 more similar results>
{
"_index": "indexus2",
"_type": "Medication",
"_id": "26198",
"_score": 2.2836545,
"_source": {
"SearchContents": " (original 5 500 feeding mg strength) tube via vicodin",
"MedShortDesc": "Vicodin 5 mg-500 mg (Original Strength) via feeding tube",
"MedLongDesc": "Vicodin 5 mg-500 mg (Original Strength) via feeding tube",
"GenericDesc": "HYDROCODONE BITARTRATE/ACETAMINOPHEN ORAL",
...
}
}
Field Mappings
"OrderableMedLongDesc": {
"type": "string",
"analyzer": "edgeNGramAnalyzer"
},
"OrderableMedShortDesc": {
"type": "string",
"analyzer": "edgeNGramAnalyzer"
},
"RepresentativeRoutedGenericDesc": {
"type": "string",
"analyzer": "edgeNGramAnalyzer"
},
"SearchContents": {
"type": "string",
"analyzer": "edgeNGramAnalyzer"
},
Here is what ES shows for my _settings for analyzers
"analyzer": {
"edgeNGramAnalyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "edgeNGram"
}
},
"tokenizer": {
"edgeNGram": {
"min_gram": "2",
"type": "edgeNGram",
"max_gram": "50"
}
}

As per the above mapping edgeNGramAnalyzer is the search-analyzer for the fields as a result the search query would also get "edge ngrammed". You probably do not want this .
Change the mapping to set only the index_analyzer option as edgeNgramAnalyzer.
The search_analyzer would then default to standard.
Example:
"SearchContents": {
"type": "string",
"index_analyzer": "edgeNGramAnalyzer"
},

Getting distinct values using NEST ElasticSearch client

I'm building a product search engine with Elastic Search in my .NET application, by using the NEST client, and there is one thing i'm having trouble with. Getting a distinct set of values.
I'm search for products, which there are many thousands, but of course i can only return 10 or 20 at a time to the user. And for this paging works fine. But besides this primary result, i want to show my users a list of brands that are found within the complete search, to present these for filtering.
I have read about that i should use Terms Aggregations for this. But, i couldn't get anything better than this. And this still doesn't really give me what i want, because it splits values like "20th Century Fox" into 3 separate values.
var brandResults = client.Search<Product>(s => s
.Query(query)
.Aggregations(a => a.Terms("my_terms_agg", t => t.Field(p => p.BrandName).Size(250))
)
);
var agg = brandResult.Aggs.Terms("my_terms_agg");
Is this even the right approach? Or should is use something totally different? And, how can i get the correct, complete values? (Not split by space .. but i guess that is what you get when you ask for a list of 'Terms'??)
What i'm looking for is what you would get if you would do this in MS SQL
SELECT DISTINCT BrandName FROM [Table To Search] WHERE [Where clause without paging]

You are correct that what you want is a terms aggregation. The problem you're running into is that ES is splitting the field "BrandName" in the results it is returning. This is the expected default behavior of a field in ES.
What I recommend is that you change BrandName into a "Multifield", this will allow you to search on all the various parts, as well as doing a terms aggregation on the "Not Analyzed" (aka full "20th Century Fox") term.
Here is the documentation from ES.
https://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/mapping-multi-field-type.html
[UPDATE]
If you are using ES version 1.4 or newer the syntax for multi-fields is a little different now.
https://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html#_multi_fields
Here is a full working sample the illustrate the point in ES 1.4.4. Note the mapping specifies a "not_analyzed" version of the field.
PUT hilden1
PUT hilden1/type1/_mapping
{
"properties": {
"brandName": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
POST hilden1/type1
{
"brandName": "foo"
}
POST hilden1/type1
{
"brandName": "bar"
}
POST hilden1/type1
{
"brandName": "20th Century Fox"
}
POST hilden1/type1
{
"brandName": "20th Century Fox"
}
POST hilden1/type1
{
"brandName": "foo bar"
}
GET hilden1/type1/_search
{
"size": 0,
"aggs": {
"analyzed_field": {
"terms": {
"field": "brandName",
"size": 10
}
},
"non_analyzed_field": {
"terms": {
"field": "brandName.raw",
"size": 10
}
}
}
}
Results of the last query:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 0,
"hits": []
},
"aggregations": {
"non_analyzed_field": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "20th Century Fox",
"doc_count": 2
},
{
"key": "bar",
"doc_count": 1
},
{
"key": "foo",
"doc_count": 1
},
{
"key": "foo bar",
"doc_count": 1
}
]
},
"analyzed_field": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "20th",
"doc_count": 2
},
{
"key": "bar",
"doc_count": 2
},
{
"key": "century",
"doc_count": 2
},
{
"key": "foo",
"doc_count": 2
},
{
"key": "fox",
"doc_count": 2
}
]
}
}
}
Notice that not-analyzed fields keep "20th century fox" and "foo bar" together where as the analyzed field breaks them up.

I had a similar issue. I was displaying search results and wanted to show counts on the category and sub category.
You're right to use aggregations. I also had the issue with the strings being tokenised (i.e. 20th century fox being split) - this happens because the fields are analysed. For me, I added the following mappings (i.e. tell ES not to analyse that field):
"category": {
"type": "nested",
"properties": {
"CategoryNameAndSlug": {
"type": "string",
"index": "not_analyzed"
},
"SubCategoryNameAndSlug": {
"type": "string",
"index": "not_analyzed"
}
}
}
As jhilden suggested, if you use this field for more than one reason (e.g. search and aggregation) you can set it up as a multifield. So on one hand it can get analysed and used for searching and on the other hand for not being analysed for aggregation.

Linq query to d3.js chart

I am looking for a good way to feed a d3.js bubble chart with data from my MVC application. For example the standard bubble chart expects nested data in the form:
{
"name": "flare",
"children": [
{
"name": "analytics",
"children": [
{
"name": "cluster",
"children": [
{
"name": "CNN",
"size": 3938
}
]
},
{
"name": "graph",
"children": [
{
"name": "MTV",
"size": 3534
}
]
}
]
}
]
}
What I have on the server side is this linq query to a SQL database:
var results = from a in db.Durations
where a.Category == "watch"
group a by a.Description
into g
select new
{
name = g.Key,
size = g.Select(d => new{d.Begin, d.End}).Sum(d => SqlFunctions.DateDiff("hh", d.Begin, d.End))
};
return Json(results, JsonRequestBehavior.AllowGet);
The query result, parsed as Json, looks like this:
[{"name":"CNN","size":1950},{"name":"MTV","size":1680}]
I've got stuck in the head on what would be a good way to achieve the correct formatting and to create the nested structure from my query results..
server-side, using anonymous types
server-side, adjusting the linq-query
client-side, using d3.js nest
use a simpler bubble model since for my purpose, the nested
structure with children is not really needed
something totally different and much much cooler than 1-4
Thank you for any input.

Replace your return statement with the following one.
return Json(new
{
name = "Sites",
children = results
},
JsonRequestBehavior.AllowGet);
That will give you the following:
{
"name": "Sites",
"children": [
{
"name": "CNN",
"size": 1950
},
{
"name": "MTV",
"size": 1680
}
]
}
To serve as an example, suppose each website had an additional string Type property, with values such as "News" or "Music". Then you could do the following.
return Json(new
{
name = "Sites",
children = results.GroupBy(site => site.Type).Select(group => new
{
name = group.Key,
children = group
}
},
JsonRequestBehavior.AllowGet);
This would give you something like the following.
{
"name": "Sites",
"children": [
{
"name": "News",
"children": [
{
"name": "CNN",
"size": 1950
},
{
"name": "The Verge",
"size": 1600
}
]
},
{
"name": "Music",
"children": [
{
"name": "MTV",
"size": 1680
},
{
"name": "Pandora",
"size": 2000
}
]
}
]
}

MongoDB remove a subdocument document from a subdocument

I use 10gen C# driver for MongoDB and I would like to remove a subdocument from a subdocument. I don't know how to do it.
Here's an example of what looks like my document
{
"_id": "binary_stuff",
"Name": "MyApplication",
"Settings": [
{
"_id": "binary_stuff",
"Key": "ImportDirectory",
"Value": "C:\data",
"Overrides": [{
"_id": "binary_stuff",
"Name": "PathDirectory",
"Value": "C:\anotherData"
}]
},
}
And I want to delete the Override which Name is PathDirectory. Here's the query I wrote but it doesn't work. I have no error.
var query = Query.And(Query.EQ("_id", applicationId), Query.EQ("Settings.Key", "ImportDirectory"), Query.EQ("Settings.$.Overrides.Name", "PathDirectory"));
Run(database => database.Applications().Remove(query));
Thanks for any help.
John

you should to use $pull operation for delete item from array.
var query = Query.And(Query.EQ("_id", applicationId),
Query.EQ("Settings.Key", "ImportDirectory"));
var update = Update.Pull("Settings.$.Overrides", new BsonDocument(){
{ "Name", "PathDirectory" }
});
database.Applications().Update(query, update);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

elasticsearch c# nest advanced search - c#

Related

Nest Elasticsearch match_phrase query throws parsing exception

ElasticSearch search getting bad results

Getting distinct values using NEST ElasticSearch client

Linq query to d3.js chart

MongoDB remove a subdocument document from a subdocument

Categories

Resources