Using elasticsearch NEST on a remote index - c#

This is probably just me completely overseeing the obvious "missing link". Here goes anyway: I have an elasticsearch end-point http://distribution.virk.dk/cvr-permanent/virksomhed/_search, and I would like to query on this end-point. Fairly simple.
As I understand, NEST gives you the ability to strongly type the interaction with the elasticsearch index, in much the same way as Visual Studio will create types for a asmx/svc when you add a reference to the respective service.
So my question is: how on earth do I get from knowing the end-point for an elasticsearch index to having types matching the index and making queries on the index? I pressume the answer is: "Use NEST!", but all tutorials I've been able to find assume you have a local index that you generate from a c#-type, which will then give you a type to use in your queries. But what to do, when it's a "remote" index that you have to build your types from?
Thanks in advance for any answer pointing in the right direction!
UPDATE:
I have retrieved the mappings in the index, which I have reduced to only the field "cvrNummer" in the following:
{
"cvr-permanent-prod-20170205" : {
"mappings" : {
"virksomhed" : {
"_size" : {
"enabled" : true
},
"properties" : {
"Vrvirksomhed" : {
"properties" : {
"type" : "long"
},
"cvrNummer" : {
"type" : "string"
},
}
}
},
}
}
}
}
}
I have then made the following class:
[ElasticsearchType(Name = "virksomhed")]
public class Company
{
[Text(Name = "cvrNummer")]
public string cvrNumber { get; set; }
}
Now, all that I want to do (to begin with) is to search for documents having cvrNummer with a certain value, f. ex. "12883404". I have the following code in a simple console application:
var node = new Uri("http://distribution.virk.dk/cvr-
permanent/virksomhed/_search");
var settings = new ConnectionSettings(node).DefaultIndex("defaultindex");
settings.BasicAuthentication("username", "password");
var client = new ElasticClient(settings);
I then try the following very simple request:
var searchResponse = client.Search<Company>(s => s
.Type<Company>()
.Query(q => q
.Match(m => m
.Field(f => f.cvrNumber)
.Query("12883404")
And I get "400 bad request". What on earth am I doing wrong?

Basically you create a C# class with the properties you need by hand, then tell nest to map the results to this class.
using Nest;
using System;
[ElasticsearchType(Name = "Name_Of_The_Mapping_In_Index_Mappings")]
public class MySearchType {
[Text(Name = "_id")]
public string Id { get; set; }
[Date(Name = "#timestamp")]
public DateTime Timestamp { get; set; }
[Number(NumberType.Long, Name = "some_numeric_property_in_the_mapping")]
public long SomeNumericProperty { get; set; }
}
Then you can type your results to the search type you just defined:
Task<ISearchResponse<MySearchType>> response = await _elasticClient.SearchAsync<MySearchType>(s => s
.Index("Name_Of_The_Index")
.Type<MySearchType>()
.Query(q =>
q.Bool(bo =>
bo.Filter(
f => f.Terms(t =>
t.Field(searchtype => searchtype.SomeNumericProperty).Terms(request.NumericInput)),
/* ... */
)
)
)
);
IReadOnylCollection<MySearchType> result = response.Documents;
This explains how you can retrieve the names needed to create the binding: Get all index and types' names from cluster in ElasticSearch.

Related

Will NEST project in elasticsearch or in the client?

If I have a complex document indexed in elasticsearch and query it using a DTO, will a projection for the fields required by the DTO be applied in elasticsearch, before sending the data to the C# client or will the full source be sent, and C# will use that to hydrate the DTO?
var response = await elasticClient.SearchAsync<TDto>(searchRequest);
Basically, I need to know if I can simply ask for a TDto and not worry about data volume of the larger ComplexDocument that was indexed, or if I have to specify Source inclusion/exclusion in the searchRequest to get the best performance.
By default, Elasticsearch will send back the full _source document for each search hit. You can specify which fields of _source to include/exclude with source filtering
var client = new ElasticClient();
var searchResponse = client.Search<ComplexDocument>(s => s
.Source(sf => sf
.Includes(i => i
.Field(f => f.Path)
.Field(f => f.Content)
)
.ExcludeAll()
)
);
foreach(var source in searchResponse.Documents)
{
var path = source.Path;
}
which sends
{
"_source": {
"excludes": ["*"],
"includes": ["path", "content"]
}
}
Or you can ask not to return _source at all
var searchResponse = client.Search<ComplexDocument>(s => s
.Source(false)
);
With source filtering, the storage field for _source is read completely on the Elasticsearch side, and filtering applied. This is usually fine, but if _source is a huge document, and you only ever want to return a subset of fields in response to a search, you might decide to use stored fields instead.
As the name implies, stored fields are fields stored separately to _source (by specifying store:true in their mapping) and can be returned in a search response
var searchResponse = client.Search<ComplexDocument>(s => s
.StoredFields(f => f
.Field(ff => ff.Path)
)
);
foreach(var fields in searchResponse.Fields)
{
var path = fields.ValueOf<ComplexDocument, string>(f => f.Path);
}
Stored fields are returned in a "fields" property on each hit.
If I have a complex document indexed in elasticsearch and query it
using a DTO, will a projection for the fields required by the DTO be
applied in elasticsearch, before sending the data to the C# client or
will the full source be sent, and C# will use that to hydrate the DTO?
In summary, Elasticsearch will return the full _source and NEST will map matching properties in the _source to properties of the DTO. NEST maps camel case properties in JSON to the POCO properties by default. If you want to transmit less across the wire, take a look at source filtering. You could probably wrap up the functionality to include only fields in the DTO in the request as an extension method to SearchDescriptor<TInferDocument>
public class ComplexDocument
{
public int Id { get; set; }
public string Path { get; set; }
public string Content { get; set; }
public Attachment Attachment { get; set; }
}
public class SimpleDTO
{
public string Path { get; set; }
}
public static class SearchDescriptorExtensions
{
public static SearchDescriptor<TInferDocument> SourceIncludesDto<TInferDocument, TDocument>(this SearchDescriptor<TInferDocument> descriptor)
where TInferDocument : class
where TDocument : class
{
// TODO: cache this :)
Fields fields = typeof(TDocument).GetProperties();
return descriptor.Source(s => s
.Includes(f => f
.Fields(fields)
)
);
}
}
ISearchResponse<SimpleDTO> searchResponse =
client.Search<ComplexDocument, SimpleDTO>(s => s
.SourceIncludesDto<ComplexDocument, SimpleDTO>()
);
sends
{
"_source": {
"includes": ["path"]
}
}

Mongo .Net Driver PipelineStageDefinitionBuilder.Project automatically ignores all Id values with a facet

When using the PipelineStageDefinitionBuilder when creating projection stages for an aggregation pipeline it appears to be always ignoring any Id values in the dataset. I'm using the Mongo .Net driver 2.8 in a .Net Core app. Below are the steps for reproduction.
The same projection worked when using the IAggregateFluent syntax on Aggregate() however I needed to use the builders for a facet. When running the builder against Aggregate it also works, however within a facet it fails to bind any Id values.
Just empty classes with id for testing (Added Type to show normal mapping works):
public class DatabaseModel
{
public Guid Id { get; set; }
public string Type { get; set; }
}
public class ProjectionClass
{
public Guid Id { get; set; }
public string Type { get; set; }
}
When I create the projection with the below, it produces a query sucessfully, however within all models returned the Id value is set to null. The query seems to have a Id_ : 0 value but the same also seems to be produced in normal aggregation so I don't think this is related?
var typeFilter = Builders<DatabaseModel>.Filter.Eq(x => x.Type, "Full");
var aggregationPipeline = new EmptyPipelineDefinition<DatabaseModel>()
.AppendStage(PipelineStageDefinitionBuilder.Match(typeFilter))
.AppendStage(PipelineStageDefinitionBuilder.Project<DatabaseModel, ProjectionClass>(x => new ProjectionClass
{
Id = x.Id,
Type = x.Type,
}));
var normalAggregationResult = await db.Aggregate(aggregationPipeline).ToListAsync();//The id's appear here
var databaseModelsFacet = AggregateFacet.Create("DatabaseModels", aggregationPipeline);
var faucetResult = db.Aggregate().Facet(databaseModelsFacet).SingleOrDefault().Facets;
var projectionModels = faucetResult.
Single(x => x.Name == "DatabaseModels")
.Output<ProjectionClass>();// This results in missing Id's (Including in nested objects with anything named Id)
Resulting mongo query
{[{
"$match" : { "Type" : "Full" } },
{ "$project" : { "Id" : "$_id", "Type" : "$Type", "_id" : 0 }
}]}
Is there any way to be able to run a projection using the pipeline builders with a facet while not ignoring the Id? I have seen examples using similar queries but haven't seen this as an issue. It could be an issue with facet as it only appears to happen when using this.
Thanks!
UPDATE 6/1/2020: Updated question after finding it only seems to occur with facet
It seems to be an Driver issue, (or other issue is when the structure does not match the fields), as Id cant simply be serialized to Id , but if you choose any other value it will work for example
[BsonNoId]
public class DatabaseModel
{
[BsonRepresentation(BsonType.ObjectId)]
public string Identifier { get; set; }
public string Type { get; set; }
}

How to write a lambda to get one property based on another property in an object

I'm trying to do what I thought would be a very simple think using Linq lambda, it probably is, but I can't find an example in any tutorial.
I have a simple class with a few properties. I want to get a list of one of the properties based on the value on another value in that class.
Below is an example of the code, using Linq to get the correct results:
public class Client
{
public int ClientId { get; set; }
public int ClientWorth { get; set; }
public strin ClientName { get; set; }
}
.
.
.
.
List<Client> allClients = this.GetAllClients();
List<string> richClients = (
from c in allClients
where c.ClientWorth > 500
select c.ClientId.ToString()).ToList();
Can someone tell me how to do this using a lambda
I can do the following:
List<Clients> richClients = allClients.Where(x => x.ClientWorth >500)
Which give me a list of all clients, but I would like to get a string list back with just the client ids.
After filtering by client worth value you should project results - i.e. select only client id value:
allClients.Where(c => c.ClientWorth > 500).Select(c => c.ClientId.ToString()).ToList()
Further reading: Enumerable.Select

Create index with multi field mapping syntax with NEST 2.x

I just can't seem to get the syntax correct for multi field mapping in NEST 2.0--if that's the correct terminology. Every example I've found for mapping seems to be <= the 1.x version of NEST. I'm new to Elasticsearch and NEST, and I've been reading their documentation, but the NEST documentation hasn't been completely updated for 2.x.
Basically, I don't need to index or store the entire type. Some fields I need for indexing only, some fields I'll need to index and retrieve, and some I don't need for indexing, just for retrieval.
MyType
{
// Index this & allow for retrieval.
int Id { get; set; }
// Index this & allow for retrieval.
// **Also**, in my searching & sorting, I need to sort on this **entire** field, not just individual tokens.
string CompanyName { get; set; }
// Don't index this for searching, but do store for display.
DateTime CreatedDate { get; set; }
// Index this for searching BUT NOT for retrieval/displaying.
string CompanyDescription { get; set; }
// Nest this.
List<MyChildType> Locations { get; set; }
}
MyChildType
{
// Index this & allow for retrieval.
string LocationName { get; set; }
// etc. other properties.
}
I've have been able to index the entire object and child as-is using the following as an example:
client.Index(item, i => i.Index(indexName));
However, the actual object is a lot larger than this, and I really don't need most of it. I've found this, which looks like what I think I want to do, but in an older version: multi field mapping elasticsearch
I think "mapping" is what I'm going for, but like I said, I'm new to Elasticsearch and NEST and I'm trying to learn the terminology.
Be gentle! :) It's my first time to ask a question on SO. Thanks!
In addition to Colin's and Selçuk's answers, you can also fully control the mapping through the fluent (and object initializer syntax) mapping API. Here's an example based on your requirements
void Main()
{
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var connectionSettings = new ConnectionSettings(pool);
var client = new ElasticClient(connectionSettings);
client.Map<MyType>(m => m
.Index("index-name")
.AutoMap()
.Properties(p => p
.String(s => s
.Name(n => n.CompanyName)
.Fields(f => f
.String(ss => ss
.Name("raw")
.NotAnalyzed()
)
)
)
.Date(d => d
.Name(n => n.CreatedDate)
.Index(NonStringIndexOption.No)
)
.String(s => s
.Name(n => n.CompanyDescription)
.Store(false)
)
.Nested<MyChildType>(n => n
.Name(nn => nn.Locations.First())
.AutoMap()
.Properties(pp => pp
/* properties of MyChildType */
)
)
)
);
}
public class MyType
{
// Index this & allow for retrieval.
public int Id { get; set; }
// Index this & allow for retrieval.
// **Also**, in my searching & sorting, I need to sort on this **entire** field, not just individual tokens.
public string CompanyName { get; set; }
// Don't index this for searching, but do store for display.
public DateTime CreatedDate { get; set; }
// Index this for searching BUT NOT for retrieval/displaying.
public string CompanyDescription { get; set; }
// Nest this.
public List<MyChildType> Locations { get; set; }
}
public class MyChildType
{
// Index this & allow for retrieval.
public string LocationName { get; set; }
// etc. other properties.
}
This produces the mapping
{
"properties": {
"id": {
"type": "integer"
},
"companyName": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"createdDate": {
"type": "date",
"index": "no"
},
"companyDescription": {
"type": "string",
"store": false
},
"locations": {
"type": "nested",
"properties": {
"locationName": {
"type": "string"
}
}
}
}
}
Calling .AutoMap() causes NEST to infer the mapping based on the property types and any attributes applied to them. Then .Properties() overrides any of the inferred mappings. For example
CompanyName is mapped as a multi_field with the field companyName analyzed using the standard analyzer and companyName.raw not analyzed. You can reference the latter in your queries using .Field(f => f.CompanyName.Suffix("raw"))
Locations is mapped as a nested type (automapping by default would infer this as an object type mapping). You can then define any specific mappings for MyChildType using .Properties() inside of the Nested<MyChildType>() call.
As far as I can see, you don't have any complex types that you are trying map. So you can easily use NEST attributes to map your objects.
Check this out:
[Nest.ElasticsearchType]
public class MyType
{
// Index this & allow for retrieval.
[Nest.Number(Store=true)]
int Id { get; set; }
// Index this & allow for retrieval.
// **Also**, in my searching & sorting, I need to sort on this **entire** field, not just individual tokens.
[Nest.String(Store = true, Index=Nest.FieldIndexOption.Analyzed, TermVector=Nest.TermVectorOption.WithPositionsOffsets)]
string CompanyName { get; set; }
// Don't index this for searching, but do store for display.
[Nest.Date(Store=true, Index=Nest.NonStringIndexOption.No)]
DateTime CreatedDate { get; set; }
// Index this for searching BUT NOT for retrieval/displaying.
[Nest.String(Store=false, Index=Nest.FieldIndexOption.Analyzed)]
string CompanyDescription { get; set; }
[Nest.Nested(Store=true, IncludeInAll=true)]
// Nest this.
List<MyChildType> Locations { get; set; }
}
[Nest.ElasticsearchType]
public class MyChildType
{
// Index this & allow for retrieval.
[Nest.String(Store=true, Index = Nest.FieldIndexOption.Analyzed)]
string LocationName { get; set; }
// etc. other properties.
}
After this declaration, to create this mapping in elasticsearch you need to make a call similar to:
var mappingResponse = elasticClient.Map<MyType>(m => m.AutoMap());
With AutoMap() call NEST will read your attributes from your POCO and create a mapping request accordingly.
Also see "Attribute Based Mapping" section from here.
Cheers!
At the time of writing, Nest does not offer a way to map a property in your class to multiple fields in your document mapping using built in attributes. However, it does provide the facilities needed to do anything with your mappings that you could do if you wrote the JSON yourself.
Here's a solution I've put together for my own needs. It shouldn't be hard to use it as the starting point for whatever you need to do.
First, here's an example of the mapping I want to generate
{
"product":{
"properties":{
"name":{
"type":"string",
"index":"not_analyzed",
"fields":{
"standard":{
"type":"string",
"analyzer":"standard"
}
}
}
}
}
}
The product document would then have the name field, which is indexed but not analyzed, and the name.standard field, which uses the standard analyzer.
The C# class that I generate the mapping from looks like this
[ElasticsearchType]
public class Product
{
[WantsStandardAnalysisField]
public string Name { get; set; }
}
Note the WantsStandardAnalysisField attribute. That's a custom attribute with no special properties added. Literally just:
public class WantsStandardAnalysisField : Attribute {}
If I were to use AutoMap as-is, my custom attribute would be ignored and I would get a mapping that has the name field, but not name.standard. Luckily, AutoMap accepts an instance of IPropertyVisitor. A base class called NoopPropertyVisitor implements the interface and does nothing at all, so you can subclass it and override only the methods you care about. When you use a property visitor with AutoMap, it will generate a document mapping for you but give you a chance to modify it before it gets sent to Elastic Search. All we need to do is look for properties marked with our custom attribute and add a field to them.
Here's an example that does that:
public class ProductPropertyVisitor : NoopPropertyVisitor
{
public override void Visit(IStringProperty type, PropertyInfo propertyInfo, ElasticsearchPropertyAttributeBase attribute)
{
base.Visit(type, propertyInfo, attribute);
var wsaf = propertyInfo.GetCustomAttribute<WantsStandardAnalysisField>();
if (wsaf != null)
{
type.Index = FieldIndexOption.NotAnalyzed;
type.Fields = new Properties
{
{
"standard",
new StringProperty
{
Index = FieldIndexOption.Analyzed,
Analyzer = "standard"
}
}
};
}
}
}
As you can see, we can do pretty much anything we want with the generated property, including turning off analysis for the main property and adding a new field with its own settings. For fun, you could add a couple properties to the custom attribute allowing you to specify the name of the field you want and the analyzer to use. You could even modify the code to see if the attribute has been added multiple times, letting you add as many fields as you want.
If you were to run this through any method that generates a mapping using AutoMap, such as:
new TypeMappingDescriptor<Product>().AutoMap(new ProductPropertyVisitor())
You'll get the desired multi-field mapping. Now you can customize mappings to your heart's content. Enjoy!
I think you have at least 2 possibilities to solve your problem:
On indexing: Create something like a metadata model, which is stored just for retrieving. See the _source field to limit the return to this field.
On searching: Specify the fields you want to query: if you don`t want to query the CreatedDate, just don't include it in your search.
In my case I am using both of these approaches to get very fast results :-)

Aggregating By Date in Mongodb

I am writing a piece of functionality in which I am required to group by Date. Here is how I do currently:
//Assuming this is my sample document in the collection
{
"_id" : ObjectId("56053d816518fd1b48e062f7"),
"memberid" : "7992bc31-c3c5-49e5-bc40-0a5ba41af0bd",
"sourceid" : NumberInt(3888),
"ispremium" : false,
"createddate" : {
"DateTime" : ISODate("2015-09-25T12:26:41.157+0000"),
"Ticks" : NumberLong(635787808011571008)
},
"details": {
//a large sub-document
}
}
Given the member id, start date and end date; I am required to search the collection matching these filters and group the results by Date. In other words, the result I need to achieve is a list like (e.g., 12/10/2015 - count is 5, 13/10/2015 - count is 2). StartDate and EndDate are the types of DateTime.
C# is the programming language I use and currently, the way I have written is:
var builder = Builders<MyCollection>.Filter;
var filter = builder.Eq(d => d.MemberId, memberId) & builder.Gte(d => d.CreatedDate, startDate) & builder.Lt(d => d.CreatedDate, endDate.AddDays(1));
var group = collection.Aggregate()
.Match(filter)
.Group(new BsonDocument { { "_id", "$createddate" }, { "count", new BsonDocument("$sum", 1) } })
.ToListAsync().Result;
I then deserialize the result to custom class...
List<CustomAggregateResult> grouped = group.Select(g => BsonSerializer.Deserialize<CustomAggregateResult>(g)).OrderBy(g => g.Date).ToList();
Where this code fails, which is obvious, is the grouping part. What would be ideal in my case is, to group by Date rather than DateTime. I have read the group documentation and some similar threads here but, unfortunately, I couldn't get it working. One of my attempts was to do as suggested in the documentation. Raw mongo query for that would be:
db.mycollection.aggregate(
[
{
$group : {
_id : { month: { $month: "$createddate" }, day: { $dayOfMonth: "$createddate" }, year: { $year: "$createddate" } },
count: { $sum: 1 }
}
}
]
)
I had left out the $match just to get this bit working. The exception I got was "cannot convert from BSON type Object to Date".
In summary, my current code works but "group" based on the DateTime (instead of just Date) and it ends up separate counts for one particular day. I am curious whether or not it is achievable. Unknown is the mongo part to me as I haven't figured out how to do this.(even in a raw mongo query).
Just some additional information to clarify, I have the following data annotation for the datetime object in C# (not sure if this affects):
[BsonDateTimeOptions(Representation = BsonType.Document)]
public DateTime CreatedDate {get; set; }
One solution in my mind is whether it's possible to project the "createddate" field on the fly and format it as "Y-m-d" and do the grouping based on the projected field.
I tried to add as many details as I can and the work I have done so far, just to make it clearer. I hope this didn't cause any confusion. I'd appreciate any help/suggestion that would help me to produce the result I want. Thanks!
I was able to fix it, according to #chridam 's comment. Thanks again!
I am writing the solution below, just in case, if someone ever runs into the same problem I did.
I changed my query so that I became:
var group = collection.Aggregate()
.Match(filter)
.Group(new BsonDocument { { "_id", new BsonDocument { { "month", new BsonDocument("$month", "$createddate.DateTime") }, { "day", new BsonDocument("$dayOfMonth", "$createddate.DateTime") }, { "year", new BsonDocument("$year", "$createddate.DateTime") } } }, { "count", new BsonDocument("$sum", 1) } })
.ToListAsync().Result;
This gave me a serialized object. Then I deserialized it into the custom class I had:
var grouped = group.Select(g => BsonSerializer.Deserialize<RootObject>(g));
Here is the custom class definition which will be a bit polished-up:
public class Id
{
public int month { get; set; }
public int day { get; set; }
public int year { get; set; }
}
public class RootObject
{
public Id _id { get; set; }
public int count { get; set; }
}
I hope this will be useful. Thanks! :)

Categories