I am fairly new to ElasticSearch and am having issues getting search results that I perceive to be good. My objective is to be able to search an index of medications (6 fields) against a phrase that the user enters. It could be one ore more words. I've tried a few approaches, but I'll outline the best one I've found so far below. Let me know what I'm doing wrong. I'm guessing that I'm missing something fundamental.
Here is a subset of the fields that I'm working with
...
"hits": [
{
"_index": "indexus2",
"_type": "Medication",
"_id": "17471",
"_score": 8.829264,
"_source": {
"SearchContents": " chew chewable oral po tylenol",
"MedShortDesc": "Tylenol PO Chew",
"MedLongDesc": "Tylenol Oral Chewable"
"GenericDesc": "ACETAMINOPHEN ORAL"
...
}
}
...
The fields that I'm searching against used an Edge NGram Analyzer. I'm using the C# Nest library for the indexing
settings.Analysis.Tokenizers.Add("edgeNGram", new EdgeNGramTokenizer()
{
MaxGram = 50,
MinGram = 2,
TokenChars = new List<string>() { "letter", "digit" }
});
settings.Analysis.Analyzers.Add("edgeNGramAnalyzer", new CustomAnalyzer()
{
Filter = new string[] { "lowercase" },
Tokenizer = "edgeNGram"
});
I am using a more_like_this query against the fields in question
GET indexus2/Medication/_search
{
"query": {
"more_like_this" : {
"fields" : ["MedShortDesc",
"MedLongDesc",
"GenericDesc",
"SearchContents"],
"like_text" : "vicodin",
"min_term_freq" : 1,
"max_query_terms" : 25,
"min_word_len": 2
}
}
}
The problem is that for this search for 'vicodin', I'd expect to see matches with the full work first, but I don't. Here is a subset of the results from this query. Vicodin doesn't show up until the 7th result
"hits": [
{
"_index": "indexus2",
"_type": "Medication",
"_id": "31192",
"_score": 4.567309,
"_source": {
"SearchContents": " oral po victrelis",
"MedShortDesc": "Victrelis PO",
"MedLongDesc": "Victrelis Oral",
"RepresentativeRoutedGenericDesc": "BOCEPREVIR ORAL",
...
}
}
<5 more similar results>
{
"_index": "indexus2",
"_type": "Medication",
"_id": "26198",
"_score": 2.2836545,
"_source": {
"SearchContents": " (original 5 500 feeding mg strength) tube via vicodin",
"MedShortDesc": "Vicodin 5 mg-500 mg (Original Strength) via feeding tube",
"MedLongDesc": "Vicodin 5 mg-500 mg (Original Strength) via feeding tube",
"GenericDesc": "HYDROCODONE BITARTRATE/ACETAMINOPHEN ORAL",
...
}
}
Field Mappings
"OrderableMedLongDesc": {
"type": "string",
"analyzer": "edgeNGramAnalyzer"
},
"OrderableMedShortDesc": {
"type": "string",
"analyzer": "edgeNGramAnalyzer"
},
"RepresentativeRoutedGenericDesc": {
"type": "string",
"analyzer": "edgeNGramAnalyzer"
},
"SearchContents": {
"type": "string",
"analyzer": "edgeNGramAnalyzer"
},
Here is what ES shows for my _settings for analyzers
"analyzer": {
"edgeNGramAnalyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "edgeNGram"
}
},
"tokenizer": {
"edgeNGram": {
"min_gram": "2",
"type": "edgeNGram",
"max_gram": "50"
}
}
As per the above mapping edgeNGramAnalyzer is the search-analyzer for the fields as a result the search query would also get "edge ngrammed". You probably do not want this .
Change the mapping to set only the index_analyzer option as edgeNgramAnalyzer.
The search_analyzer would then default to standard.
Example:
"SearchContents": {
"type": "string",
"index_analyzer": "edgeNGramAnalyzer"
},
Related
I'm trying to find the shortest path along with relations on the nodes on the path, for which below query is used.
MATCH p = shortestPath((p1:Person { name: 'Kevin Bacon' })-[*..15]-
(p2:Person { name: 'Meg Ryan' }))
UNWIND nodes(p) as n
MATCH (n)-[*]->(q)
RETURN n, q
However i want to return the result as json object with data format as below in c#. I understand we have to use apoc. However can't really understand how to proceed.
{
"results": [
{
"data": [
{
"graph": {
"nodes": [
{
"id": "1",
"labels": ["James"],
"properties": {
"ShortName": "jammy",
"Type": "Person",
"Age": 34
}
},
{
"id": "2",
"labels": ["Brad"],
"properties": {
"name": "Brad",
"PlaceOfBirth": "California",
"Type": "Person",
"description": "Nice actor",
}
},
{
"id": "3",
"labels": ["Titanic"],
"properties": {
"movieName": "Titanic",
"Type": "Movie",
"description": "Tragedy",
}
}
],
"relationships": [
{
"id": "4",
"type": "ACTED_IN",
"startNode": "1",
"endNode": "3",
"properties": {
"from": 1470002400000
}
}
]
}
}
]
}
],
"errors": []
}
You can collect the nodes and relationships separately and add it on the result.
MATCH p = shortestPath((p1:Person { name: 'Kevin Bacon' })-[*..15]-(p2:Person { name: 'Meg Ryan' }))
UNWIND nodes(p) as n
MATCH (n)-[r]->(q)
WITH collect(distinct n) + collect(distinct q) as node_list, collect(distinct r) as rel_list
RETURN {results: {data: {graph: {nodes: node_list, relationships: rel_list}}, error: []}} as output
I wanted the first level incoming and outgoing relations of all the nodes on the path. Slighty modified the answer for anyone looking for something similar in future. Thanks Jose.
MATCH p = shortestpath((p1:Person { name: 'Kevin Bacon' })-[*..30]-
(p2:Person { name: 'Meg Ryan' }))
UNWIND nodes(p) as n
MATCH (n)<-[r*1]->(q)
WITH collect(distinct n) + collect(distinct q) as node_list,
collect(distinct r) as rel_list
RETURN {results: {data: {graph: {nodes: node_list, relationships:
rel_list}}, error: []}} as output
"hits" : [
{
"id": 1,
"sampleArrayData": ["x"]
},
{
"id": 2,
"sampleArrayData": ["y"]
},
{
"id": 3,
"sampleArrayData": ["z"]
},
{
"id": 4,
"sampleArrayData": ["x", "y", "z"]
},
{
"id": 5,
"sampleArrayData": ["z", "w"]
}
]
It's a sample data of index.
I want to search this index by sampleArrayData field which includes one or more values I will give dynamically.
For example if want to search this index by ["x","y"] parameters. I must get data which includes "x", or "y" or both "x", "y". (First three records from this index).
We could do this by using terms query on the old elastic search versions like below.
{
"query": {
"bool": {
"must": [
{
"terms": {
"sampleArrayData": [ "x", "y"]
}
}
]
}
}
}
But we can not use terms query for "text" or "keyword" type fields on the elasticsearch current versions.
How can I dynamically search this index with unknown number of parameters on the C# NEST library?
I have a problem about querying a complex data type with c# nest api in elasticsearch. My model in elasticsearch is like this:
"hits": [
{
"_index": "post",
"_type": "postmodel",
"_source": {
"projectId": "2",
"language": "en",
"postDate": "2017-06-11T08:39:32Z",
"profiles": [
{
"label": "Emotional",
"confidence": 1
}
]
}
},
{
"_index": "post",
"_type": "postmodel",
"_source": {
"projectId": "3",
"language": "en",
"postDate": "2017-06-11T08:05:01Z",
"profiles": [
{
"label": "Fact oriented",
"confidence": 0.69
},
{
"label": "Rational",
"confidence": 1
}
]
}
},
...
By using c# Nest API, i want to fetch the postmodels which is projectId=3 and with "Rational" profile. My current code looks like this:
var postModels = await _elasticClient.SearchAsync<PostModel>(s => s
.Index("post")
.Query(q =>
{
QueryContainer query = new QueryContainer();
query = query && q.Match(m => m.Field(f => f.ProjectId)
.Query("3"));
return query;
}));
But i dont know how to query "Profiles". i want to extend my query to fetch specific profiles as well. I would be happy if someone can help me with this problem. Thank you in advance.
I'm building a product search engine with Elastic Search in my .NET application, by using the NEST client, and there is one thing i'm having trouble with. Getting a distinct set of values.
I'm search for products, which there are many thousands, but of course i can only return 10 or 20 at a time to the user. And for this paging works fine. But besides this primary result, i want to show my users a list of brands that are found within the complete search, to present these for filtering.
I have read about that i should use Terms Aggregations for this. But, i couldn't get anything better than this. And this still doesn't really give me what i want, because it splits values like "20th Century Fox" into 3 separate values.
var brandResults = client.Search<Product>(s => s
.Query(query)
.Aggregations(a => a.Terms("my_terms_agg", t => t.Field(p => p.BrandName).Size(250))
)
);
var agg = brandResult.Aggs.Terms("my_terms_agg");
Is this even the right approach? Or should is use something totally different? And, how can i get the correct, complete values? (Not split by space .. but i guess that is what you get when you ask for a list of 'Terms'??)
What i'm looking for is what you would get if you would do this in MS SQL
SELECT DISTINCT BrandName FROM [Table To Search] WHERE [Where clause without paging]
You are correct that what you want is a terms aggregation. The problem you're running into is that ES is splitting the field "BrandName" in the results it is returning. This is the expected default behavior of a field in ES.
What I recommend is that you change BrandName into a "Multifield", this will allow you to search on all the various parts, as well as doing a terms aggregation on the "Not Analyzed" (aka full "20th Century Fox") term.
Here is the documentation from ES.
https://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/mapping-multi-field-type.html
[UPDATE]
If you are using ES version 1.4 or newer the syntax for multi-fields is a little different now.
https://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html#_multi_fields
Here is a full working sample the illustrate the point in ES 1.4.4. Note the mapping specifies a "not_analyzed" version of the field.
PUT hilden1
PUT hilden1/type1/_mapping
{
"properties": {
"brandName": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
POST hilden1/type1
{
"brandName": "foo"
}
POST hilden1/type1
{
"brandName": "bar"
}
POST hilden1/type1
{
"brandName": "20th Century Fox"
}
POST hilden1/type1
{
"brandName": "20th Century Fox"
}
POST hilden1/type1
{
"brandName": "foo bar"
}
GET hilden1/type1/_search
{
"size": 0,
"aggs": {
"analyzed_field": {
"terms": {
"field": "brandName",
"size": 10
}
},
"non_analyzed_field": {
"terms": {
"field": "brandName.raw",
"size": 10
}
}
}
}
Results of the last query:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 0,
"hits": []
},
"aggregations": {
"non_analyzed_field": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "20th Century Fox",
"doc_count": 2
},
{
"key": "bar",
"doc_count": 1
},
{
"key": "foo",
"doc_count": 1
},
{
"key": "foo bar",
"doc_count": 1
}
]
},
"analyzed_field": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "20th",
"doc_count": 2
},
{
"key": "bar",
"doc_count": 2
},
{
"key": "century",
"doc_count": 2
},
{
"key": "foo",
"doc_count": 2
},
{
"key": "fox",
"doc_count": 2
}
]
}
}
}
Notice that not-analyzed fields keep "20th century fox" and "foo bar" together where as the analyzed field breaks them up.
I had a similar issue. I was displaying search results and wanted to show counts on the category and sub category.
You're right to use aggregations. I also had the issue with the strings being tokenised (i.e. 20th century fox being split) - this happens because the fields are analysed. For me, I added the following mappings (i.e. tell ES not to analyse that field):
"category": {
"type": "nested",
"properties": {
"CategoryNameAndSlug": {
"type": "string",
"index": "not_analyzed"
},
"SubCategoryNameAndSlug": {
"type": "string",
"index": "not_analyzed"
}
}
}
As jhilden suggested, if you use this field for more than one reason (e.g. search and aggregation) you can set it up as a multifield. So on one hand it can get analysed and used for searching and on the other hand for not being analysed for aggregation.
I am looking for a good way to feed a d3.js bubble chart with data from my MVC application. For example the standard bubble chart expects nested data in the form:
{
"name": "flare",
"children": [
{
"name": "analytics",
"children": [
{
"name": "cluster",
"children": [
{
"name": "CNN",
"size": 3938
}
]
},
{
"name": "graph",
"children": [
{
"name": "MTV",
"size": 3534
}
]
}
]
}
]
}
What I have on the server side is this linq query to a SQL database:
var results = from a in db.Durations
where a.Category == "watch"
group a by a.Description
into g
select new
{
name = g.Key,
size = g.Select(d => new{d.Begin, d.End}).Sum(d => SqlFunctions.DateDiff("hh", d.Begin, d.End))
};
return Json(results, JsonRequestBehavior.AllowGet);
The query result, parsed as Json, looks like this:
[{"name":"CNN","size":1950},{"name":"MTV","size":1680}]
I've got stuck in the head on what would be a good way to achieve the correct formatting and to create the nested structure from my query results..
server-side, using anonymous types
server-side, adjusting the linq-query
client-side, using d3.js nest
use a simpler bubble model since for my purpose, the nested
structure with children is not really needed
something totally different and much much cooler than 1-4
Thank you for any input.
Replace your return statement with the following one.
return Json(new
{
name = "Sites",
children = results
},
JsonRequestBehavior.AllowGet);
That will give you the following:
{
"name": "Sites",
"children": [
{
"name": "CNN",
"size": 1950
},
{
"name": "MTV",
"size": 1680
}
]
}
To serve as an example, suppose each website had an additional string Type property, with values such as "News" or "Music". Then you could do the following.
return Json(new
{
name = "Sites",
children = results.GroupBy(site => site.Type).Select(group => new
{
name = group.Key,
children = group
}
},
JsonRequestBehavior.AllowGet);
This would give you something like the following.
{
"name": "Sites",
"children": [
{
"name": "News",
"children": [
{
"name": "CNN",
"size": 1950
},
{
"name": "The Verge",
"size": 1600
}
]
},
{
"name": "Music",
"children": [
{
"name": "MTV",
"size": 1680
},
{
"name": "Pandora",
"size": 2000
}
]
}
]
}