Will NEST project in elasticsearch or in the client? - c#

If I have a complex document indexed in elasticsearch and query it using a DTO, will a projection for the fields required by the DTO be applied in elasticsearch, before sending the data to the C# client or will the full source be sent, and C# will use that to hydrate the DTO?
var response = await elasticClient.SearchAsync<TDto>(searchRequest);
Basically, I need to know if I can simply ask for a TDto and not worry about data volume of the larger ComplexDocument that was indexed, or if I have to specify Source inclusion/exclusion in the searchRequest to get the best performance.

By default, Elasticsearch will send back the full _source document for each search hit. You can specify which fields of _source to include/exclude with source filtering
var client = new ElasticClient();
var searchResponse = client.Search<ComplexDocument>(s => s
.Source(sf => sf
.Includes(i => i
.Field(f => f.Path)
.Field(f => f.Content)
)
.ExcludeAll()
)
);
foreach(var source in searchResponse.Documents)
{
var path = source.Path;
}
which sends
{
"_source": {
"excludes": ["*"],
"includes": ["path", "content"]
}
}
Or you can ask not to return _source at all
var searchResponse = client.Search<ComplexDocument>(s => s
.Source(false)
);
With source filtering, the storage field for _source is read completely on the Elasticsearch side, and filtering applied. This is usually fine, but if _source is a huge document, and you only ever want to return a subset of fields in response to a search, you might decide to use stored fields instead.
As the name implies, stored fields are fields stored separately to _source (by specifying store:true in their mapping) and can be returned in a search response
var searchResponse = client.Search<ComplexDocument>(s => s
.StoredFields(f => f
.Field(ff => ff.Path)
)
);
foreach(var fields in searchResponse.Fields)
{
var path = fields.ValueOf<ComplexDocument, string>(f => f.Path);
}
Stored fields are returned in a "fields" property on each hit.
If I have a complex document indexed in elasticsearch and query it
using a DTO, will a projection for the fields required by the DTO be
applied in elasticsearch, before sending the data to the C# client or
will the full source be sent, and C# will use that to hydrate the DTO?
In summary, Elasticsearch will return the full _source and NEST will map matching properties in the _source to properties of the DTO. NEST maps camel case properties in JSON to the POCO properties by default. If you want to transmit less across the wire, take a look at source filtering. You could probably wrap up the functionality to include only fields in the DTO in the request as an extension method to SearchDescriptor<TInferDocument>
public class ComplexDocument
{
public int Id { get; set; }
public string Path { get; set; }
public string Content { get; set; }
public Attachment Attachment { get; set; }
}
public class SimpleDTO
{
public string Path { get; set; }
}
public static class SearchDescriptorExtensions
{
public static SearchDescriptor<TInferDocument> SourceIncludesDto<TInferDocument, TDocument>(this SearchDescriptor<TInferDocument> descriptor)
where TInferDocument : class
where TDocument : class
{
// TODO: cache this :)
Fields fields = typeof(TDocument).GetProperties();
return descriptor.Source(s => s
.Includes(f => f
.Fields(fields)
)
);
}
}
ISearchResponse<SimpleDTO> searchResponse =
client.Search<ComplexDocument, SimpleDTO>(s => s
.SourceIncludesDto<ComplexDocument, SimpleDTO>()
);
sends
{
"_source": {
"includes": ["path"]
}
}

Related

Use Guid as ID in MongoDb for C#

I'm trying to use Guid datatype as Id in my Poco object "Parameter". However, while I'm able to write files to the database I can't read from it.
This is the import function writing table headers from a csv file into the database. First line of the csv file are parameters and second line the units those parameters are measured in. All other lines contain actual values and are stored in another collection as BsonDocument. The csv files are dynamic and need to be selectable via combobox, which is why the parameters are written in their own collection.
IMongoCollection<Parameter> parameterCollection = this.MongoDatabase.GetCollection<Parameter>("Parameters");
columnNames.Select((columnName, index) => new Parameter() { Name = columnName, Unit = columnUnits[index] })
.ToList()
.ForEach(parameter =>
{
parameterCollection.UpdateOne(Builders<Parameter>.Filter.Eq("Name", parameter.Name),
Builders<Parameter>.Update.Set("Unit", parameter.Unit),
new UpdateOptions()
{
IsUpsert = true
});
});
This is the Parameter class:
public class Parameter
{
[BsonId]
public Guid Id { get; set; }
public string Name { get; set; }
public string Unit { get; set; }
}
Here's the method trying to read the data from the document:
public List<Parameter> GetParameters()
{
return this.MongoDatabase.GetCollection<Parameter>("Parameters")
.Find(Builders<Parameter>.Filter.Empty)
.ToList();
}
This results in the following error message:
"SystemFormatException: 'An error occurred while deserializing the Id property of class TimeSeriesInterface.DTO.Parameter: Cannot deserialize a 'Guid' from BsonType 'ObjectId'.'
I also tried this attribute: [BsonId(IdGenerator = typeof(GuidGenerator))]
I'm unable to find any help besides those two attributes. They seem to solve it for everybody else, but I still keep getting this error.
I may add that the import and read functions are parts of different classes each calling their own new MongoClient().GetDatabase(MongoDatabaseRepository.DatabaseName); but when I use ObjectId as data type I do get the data so I don't think that's the issue.
Why not use ObjectId as data type? We have an extra project for database access and I do not wish to add the mongodb assembly all over the place just because other projects use the POCOs and require a reference for that pesky little ObjectId.
EDIT:
This is the mapping used within the constructor after suggestion by AlexeyBogdan (beforehand it was simply the call to AutoMap()):
public MongoDatabaseRepository(string connectionString)
{
this.MongoDbClient = new MongoClient();
this.MongoDatabase = this.MongoDbClient.GetDatabase(MongoDatabaseRepository.DatabaseName);
BsonClassMap.RegisterClassMap<Parameter>(parameterMap =>
{
parameterMap.AutoMap();
parameterMap.MapIdMember(parameter => parameter.Id);
});
}
Instead of [BsonId] I recommend you to use this mapping
BsonClassMap.RegisterClassMap<Type>(cm =>
{
cm.AutoMap();
cm.MapIdMember(c => c.Id);
});
I Hope it helps you.

Using elasticsearch NEST on a remote index

This is probably just me completely overseeing the obvious "missing link". Here goes anyway: I have an elasticsearch end-point http://distribution.virk.dk/cvr-permanent/virksomhed/_search, and I would like to query on this end-point. Fairly simple.
As I understand, NEST gives you the ability to strongly type the interaction with the elasticsearch index, in much the same way as Visual Studio will create types for a asmx/svc when you add a reference to the respective service.
So my question is: how on earth do I get from knowing the end-point for an elasticsearch index to having types matching the index and making queries on the index? I pressume the answer is: "Use NEST!", but all tutorials I've been able to find assume you have a local index that you generate from a c#-type, which will then give you a type to use in your queries. But what to do, when it's a "remote" index that you have to build your types from?
Thanks in advance for any answer pointing in the right direction!
UPDATE:
I have retrieved the mappings in the index, which I have reduced to only the field "cvrNummer" in the following:
{
"cvr-permanent-prod-20170205" : {
"mappings" : {
"virksomhed" : {
"_size" : {
"enabled" : true
},
"properties" : {
"Vrvirksomhed" : {
"properties" : {
"type" : "long"
},
"cvrNummer" : {
"type" : "string"
},
}
}
},
}
}
}
}
}
I have then made the following class:
[ElasticsearchType(Name = "virksomhed")]
public class Company
{
[Text(Name = "cvrNummer")]
public string cvrNumber { get; set; }
}
Now, all that I want to do (to begin with) is to search for documents having cvrNummer with a certain value, f. ex. "12883404". I have the following code in a simple console application:
var node = new Uri("http://distribution.virk.dk/cvr-
permanent/virksomhed/_search");
var settings = new ConnectionSettings(node).DefaultIndex("defaultindex");
settings.BasicAuthentication("username", "password");
var client = new ElasticClient(settings);
I then try the following very simple request:
var searchResponse = client.Search<Company>(s => s
.Type<Company>()
.Query(q => q
.Match(m => m
.Field(f => f.cvrNumber)
.Query("12883404")
And I get "400 bad request". What on earth am I doing wrong?
Basically you create a C# class with the properties you need by hand, then tell nest to map the results to this class.
using Nest;
using System;
[ElasticsearchType(Name = "Name_Of_The_Mapping_In_Index_Mappings")]
public class MySearchType {
[Text(Name = "_id")]
public string Id { get; set; }
[Date(Name = "#timestamp")]
public DateTime Timestamp { get; set; }
[Number(NumberType.Long, Name = "some_numeric_property_in_the_mapping")]
public long SomeNumericProperty { get; set; }
}
Then you can type your results to the search type you just defined:
Task<ISearchResponse<MySearchType>> response = await _elasticClient.SearchAsync<MySearchType>(s => s
.Index("Name_Of_The_Index")
.Type<MySearchType>()
.Query(q =>
q.Bool(bo =>
bo.Filter(
f => f.Terms(t =>
t.Field(searchtype => searchtype.SomeNumericProperty).Terms(request.NumericInput)),
/* ... */
)
)
)
);
IReadOnylCollection<MySearchType> result = response.Documents;
This explains how you can retrieve the names needed to create the binding: Get all index and types' names from cluster in ElasticSearch.

How to use the Serialize method from the ElasticClient class from the NEST client in c#?

I've created a successful connection to ES, and then written my json query. Now, I would like to send that query via the Serialize method.
The Serialize method requires two parameters:
1. object and
2. Stream writableStream
My question is, with the second one. When I create a stream with the following code line:
Stream wstream;
And use it to initialize my json2 variable with the following code:
var json2 = highLevelclient.Serializer.Serialize(query, wstream).Utf8String();
I get the following error on the wstream variable:
Use of unassigned local variable 'wstream'.
Am I missing something? Is it the way I create the wstream variable that is wrong? thank you.
/* \\\ edit: ///// */
There is another issue now, I use Searchblox to index and search my files, which itself calls ES 2.x to do the job. Searchblox uses a "mapping.json" file to initialize a mapping upon the creation of an index. Here's the link to that file.
As "#Russ Cam" suggested, I created my own class content with the following code (just like he did with the "questions" index and "Question" class):
public class Content
{
public string type { get; set; }
public Fields fields { get; set; }
}
public class Fields
{
public Content1 content { get; set; }
public Autocomplete autocomplete { get; set; }
}
public class Content1
{
public string type { get; set; }
public string store { get; set; }
public string index { get; set; }
public string analyzer { get; set; }
public string include_in_all { get; set; }
public string boost { get; set; }
} //got this with paste special->json class
These fields from the content class (type,store etc.) come from the mapping.json file attached above.
Now, when I (just like you showed me) execute the following code:
var searchResponse = highLevelclient.Search<Content>(s => s.Query(q => q
.Match(m => m.Field(f => f.fields.content)
.Query("service")
All I get as a response on the searchResponse variable is:
Valid NEST response built from a successful low level call on POST: /idx014/content/_search
Audit trail of this API call:
-HealthyResponse: Node: http://localhost:9200/ Took: 00:00:00.7180404
Request:
{"query":{"match":{"fields.content":{"query":"service"}}}}
Response:
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
And no documents in searchResponse.Documents.
Contradictorily, when I search for the "service" query on Searchblox or make an API call to localhost:9200 with the Sense extension of Google Chrome, I get 2 documents. (the documents that I was looking for)
In brief, all I want is to be able to :
get all the documents (no criteria)
get all the documents within a time range and based upon keywords.. such as "service"
What am I doing wrong? I can provide with more information if needed..
Thank you all for your detailed answers.
It's actually much simpler than this with NEST :) The client will serialize your requests for you and send them to Elasticsearch, you don't need to take the step to serialize them yourself and then pass them to the client to send to Elasticsearch.
Take a search request for example. Given the following POCO
public class Question
{
public string Body { get; set; }
}
We can search for questions that contain the phrase "this should never happen" in the body with
var settings = new ConnectionSettings(new Uri("http://localhost:9200"))
.InferMappingFor<Question>(m => m
.IndexName("questions")
);
var client = new ElasticClient(settings);
var searchResponse = client.Search<Question>(s => s
.Query(q => q
.MatchPhrase(m => m
.Field(f => f.Body)
.Query("this should never happen")
)
)
);
// do something with the response
foreach (var question in searchResponse.Documents)
{
Console.WriteLine(question.Body);
}
this line
My question is, with the second one. When I create a stream with the following code line:
Stream wstream;
does not create na object. It nearly declares it. You need to new-ed it.
Stream wstream = new MemoryStream(); //doesn't have to be MemoryStream here - check what does Serialize expects
Just remember to close it later or use using statement.
using(Stream stream = new MemoryStream())
{
//do operations on wstream
} //closes automatically here
You just declared wstream but never assigned an actual stream to it. Depending on how Serialize method works it could be:
you need to create a stream and pass it to the Serialize method
or you need to pass stream parameter with out prefix

Create index with multi field mapping syntax with NEST 2.x

I just can't seem to get the syntax correct for multi field mapping in NEST 2.0--if that's the correct terminology. Every example I've found for mapping seems to be <= the 1.x version of NEST. I'm new to Elasticsearch and NEST, and I've been reading their documentation, but the NEST documentation hasn't been completely updated for 2.x.
Basically, I don't need to index or store the entire type. Some fields I need for indexing only, some fields I'll need to index and retrieve, and some I don't need for indexing, just for retrieval.
MyType
{
// Index this & allow for retrieval.
int Id { get; set; }
// Index this & allow for retrieval.
// **Also**, in my searching & sorting, I need to sort on this **entire** field, not just individual tokens.
string CompanyName { get; set; }
// Don't index this for searching, but do store for display.
DateTime CreatedDate { get; set; }
// Index this for searching BUT NOT for retrieval/displaying.
string CompanyDescription { get; set; }
// Nest this.
List<MyChildType> Locations { get; set; }
}
MyChildType
{
// Index this & allow for retrieval.
string LocationName { get; set; }
// etc. other properties.
}
I've have been able to index the entire object and child as-is using the following as an example:
client.Index(item, i => i.Index(indexName));
However, the actual object is a lot larger than this, and I really don't need most of it. I've found this, which looks like what I think I want to do, but in an older version: multi field mapping elasticsearch
I think "mapping" is what I'm going for, but like I said, I'm new to Elasticsearch and NEST and I'm trying to learn the terminology.
Be gentle! :) It's my first time to ask a question on SO. Thanks!
In addition to Colin's and Selçuk's answers, you can also fully control the mapping through the fluent (and object initializer syntax) mapping API. Here's an example based on your requirements
void Main()
{
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var connectionSettings = new ConnectionSettings(pool);
var client = new ElasticClient(connectionSettings);
client.Map<MyType>(m => m
.Index("index-name")
.AutoMap()
.Properties(p => p
.String(s => s
.Name(n => n.CompanyName)
.Fields(f => f
.String(ss => ss
.Name("raw")
.NotAnalyzed()
)
)
)
.Date(d => d
.Name(n => n.CreatedDate)
.Index(NonStringIndexOption.No)
)
.String(s => s
.Name(n => n.CompanyDescription)
.Store(false)
)
.Nested<MyChildType>(n => n
.Name(nn => nn.Locations.First())
.AutoMap()
.Properties(pp => pp
/* properties of MyChildType */
)
)
)
);
}
public class MyType
{
// Index this & allow for retrieval.
public int Id { get; set; }
// Index this & allow for retrieval.
// **Also**, in my searching & sorting, I need to sort on this **entire** field, not just individual tokens.
public string CompanyName { get; set; }
// Don't index this for searching, but do store for display.
public DateTime CreatedDate { get; set; }
// Index this for searching BUT NOT for retrieval/displaying.
public string CompanyDescription { get; set; }
// Nest this.
public List<MyChildType> Locations { get; set; }
}
public class MyChildType
{
// Index this & allow for retrieval.
public string LocationName { get; set; }
// etc. other properties.
}
This produces the mapping
{
"properties": {
"id": {
"type": "integer"
},
"companyName": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"createdDate": {
"type": "date",
"index": "no"
},
"companyDescription": {
"type": "string",
"store": false
},
"locations": {
"type": "nested",
"properties": {
"locationName": {
"type": "string"
}
}
}
}
}
Calling .AutoMap() causes NEST to infer the mapping based on the property types and any attributes applied to them. Then .Properties() overrides any of the inferred mappings. For example
CompanyName is mapped as a multi_field with the field companyName analyzed using the standard analyzer and companyName.raw not analyzed. You can reference the latter in your queries using .Field(f => f.CompanyName.Suffix("raw"))
Locations is mapped as a nested type (automapping by default would infer this as an object type mapping). You can then define any specific mappings for MyChildType using .Properties() inside of the Nested<MyChildType>() call.
As far as I can see, you don't have any complex types that you are trying map. So you can easily use NEST attributes to map your objects.
Check this out:
[Nest.ElasticsearchType]
public class MyType
{
// Index this & allow for retrieval.
[Nest.Number(Store=true)]
int Id { get; set; }
// Index this & allow for retrieval.
// **Also**, in my searching & sorting, I need to sort on this **entire** field, not just individual tokens.
[Nest.String(Store = true, Index=Nest.FieldIndexOption.Analyzed, TermVector=Nest.TermVectorOption.WithPositionsOffsets)]
string CompanyName { get; set; }
// Don't index this for searching, but do store for display.
[Nest.Date(Store=true, Index=Nest.NonStringIndexOption.No)]
DateTime CreatedDate { get; set; }
// Index this for searching BUT NOT for retrieval/displaying.
[Nest.String(Store=false, Index=Nest.FieldIndexOption.Analyzed)]
string CompanyDescription { get; set; }
[Nest.Nested(Store=true, IncludeInAll=true)]
// Nest this.
List<MyChildType> Locations { get; set; }
}
[Nest.ElasticsearchType]
public class MyChildType
{
// Index this & allow for retrieval.
[Nest.String(Store=true, Index = Nest.FieldIndexOption.Analyzed)]
string LocationName { get; set; }
// etc. other properties.
}
After this declaration, to create this mapping in elasticsearch you need to make a call similar to:
var mappingResponse = elasticClient.Map<MyType>(m => m.AutoMap());
With AutoMap() call NEST will read your attributes from your POCO and create a mapping request accordingly.
Also see "Attribute Based Mapping" section from here.
Cheers!
At the time of writing, Nest does not offer a way to map a property in your class to multiple fields in your document mapping using built in attributes. However, it does provide the facilities needed to do anything with your mappings that you could do if you wrote the JSON yourself.
Here's a solution I've put together for my own needs. It shouldn't be hard to use it as the starting point for whatever you need to do.
First, here's an example of the mapping I want to generate
{
"product":{
"properties":{
"name":{
"type":"string",
"index":"not_analyzed",
"fields":{
"standard":{
"type":"string",
"analyzer":"standard"
}
}
}
}
}
}
The product document would then have the name field, which is indexed but not analyzed, and the name.standard field, which uses the standard analyzer.
The C# class that I generate the mapping from looks like this
[ElasticsearchType]
public class Product
{
[WantsStandardAnalysisField]
public string Name { get; set; }
}
Note the WantsStandardAnalysisField attribute. That's a custom attribute with no special properties added. Literally just:
public class WantsStandardAnalysisField : Attribute {}
If I were to use AutoMap as-is, my custom attribute would be ignored and I would get a mapping that has the name field, but not name.standard. Luckily, AutoMap accepts an instance of IPropertyVisitor. A base class called NoopPropertyVisitor implements the interface and does nothing at all, so you can subclass it and override only the methods you care about. When you use a property visitor with AutoMap, it will generate a document mapping for you but give you a chance to modify it before it gets sent to Elastic Search. All we need to do is look for properties marked with our custom attribute and add a field to them.
Here's an example that does that:
public class ProductPropertyVisitor : NoopPropertyVisitor
{
public override void Visit(IStringProperty type, PropertyInfo propertyInfo, ElasticsearchPropertyAttributeBase attribute)
{
base.Visit(type, propertyInfo, attribute);
var wsaf = propertyInfo.GetCustomAttribute<WantsStandardAnalysisField>();
if (wsaf != null)
{
type.Index = FieldIndexOption.NotAnalyzed;
type.Fields = new Properties
{
{
"standard",
new StringProperty
{
Index = FieldIndexOption.Analyzed,
Analyzer = "standard"
}
}
};
}
}
}
As you can see, we can do pretty much anything we want with the generated property, including turning off analysis for the main property and adding a new field with its own settings. For fun, you could add a couple properties to the custom attribute allowing you to specify the name of the field you want and the analyzer to use. You could even modify the code to see if the attribute has been added multiple times, letting you add as many fields as you want.
If you were to run this through any method that generates a mapping using AutoMap, such as:
new TypeMappingDescriptor<Product>().AutoMap(new ProductPropertyVisitor())
You'll get the desired multi-field mapping. Now you can customize mappings to your heart's content. Enjoy!
I think you have at least 2 possibilities to solve your problem:
On indexing: Create something like a metadata model, which is stored just for retrieving. See the _source field to limit the return to this field.
On searching: Specify the fields you want to query: if you don`t want to query the CreatedDate, just don't include it in your search.
In my case I am using both of these approaches to get very fast results :-)

Filter by any of subdocuments filed's value

I'm using MongoDB to store some data. Documents have some mandatory fields and a set of optional. There can be any number of optional fields (this is metadata):
class DataItem {
public int id {get; set;}
public string Comment { get; set; }
[BsonExtraElementsAttribute]
BsonDocument Metadata { get; set; }
}
Metadata field names might be different for different documents, so I do not know these names.
I need to query such documents where any filed of Metadata contains a particular value.
I tried:
var query = "<some value>";
var res = collection.Find(di => di.Metadata.ContainsValue(BsonValue.Create(query))).ToListAsync();
But this code throws exception because ContainsValue() method is not supported there. When I try this:
var res = collection.Find(di => di.Metadata.Values.Contains(BsonValue.Create(query))).ToListAsync();
an empty result set is returned. I think the problem is in [BsonExtraElementsAttribute] but I cannot change it. Is there a way to do so?

Categories