Aggregating By Date in Mongodb

Aggregating By Date in Mongodb - c#

I am writing a piece of functionality in which I am required to group by Date. Here is how I do currently:
//Assuming this is my sample document in the collection
{
"_id" : ObjectId("56053d816518fd1b48e062f7"),
"memberid" : "7992bc31-c3c5-49e5-bc40-0a5ba41af0bd",
"sourceid" : NumberInt(3888),
"ispremium" : false,
"createddate" : {
"DateTime" : ISODate("2015-09-25T12:26:41.157+0000"),
"Ticks" : NumberLong(635787808011571008)
},
"details": {
//a large sub-document
}
}
Given the member id, start date and end date; I am required to search the collection matching these filters and group the results by Date. In other words, the result I need to achieve is a list like (e.g., 12/10/2015 - count is 5, 13/10/2015 - count is 2). StartDate and EndDate are the types of DateTime.
C# is the programming language I use and currently, the way I have written is:
var builder = Builders<MyCollection>.Filter;
var filter = builder.Eq(d => d.MemberId, memberId) & builder.Gte(d => d.CreatedDate, startDate) & builder.Lt(d => d.CreatedDate, endDate.AddDays(1));
var group = collection.Aggregate()
.Match(filter)
.Group(new BsonDocument { { "_id", "$createddate" }, { "count", new BsonDocument("$sum", 1) } })
.ToListAsync().Result;
I then deserialize the result to custom class...
List<CustomAggregateResult> grouped = group.Select(g => BsonSerializer.Deserialize<CustomAggregateResult>(g)).OrderBy(g => g.Date).ToList();
Where this code fails, which is obvious, is the grouping part. What would be ideal in my case is, to group by Date rather than DateTime. I have read the group documentation and some similar threads here but, unfortunately, I couldn't get it working. One of my attempts was to do as suggested in the documentation. Raw mongo query for that would be:
db.mycollection.aggregate(
[
{
$group : {
_id : { month: { $month: "$createddate" }, day: { $dayOfMonth: "$createddate" }, year: { $year: "$createddate" } },
count: { $sum: 1 }
}
}
]
)
I had left out the $match just to get this bit working. The exception I got was "cannot convert from BSON type Object to Date".
In summary, my current code works but "group" based on the DateTime (instead of just Date) and it ends up separate counts for one particular day. I am curious whether or not it is achievable. Unknown is the mongo part to me as I haven't figured out how to do this.(even in a raw mongo query).
Just some additional information to clarify, I have the following data annotation for the datetime object in C# (not sure if this affects):
[BsonDateTimeOptions(Representation = BsonType.Document)]
public DateTime CreatedDate {get; set; }
One solution in my mind is whether it's possible to project the "createddate" field on the fly and format it as "Y-m-d" and do the grouping based on the projected field.
I tried to add as many details as I can and the work I have done so far, just to make it clearer. I hope this didn't cause any confusion. I'd appreciate any help/suggestion that would help me to produce the result I want. Thanks!

I was able to fix it, according to #chridam 's comment. Thanks again!
I am writing the solution below, just in case, if someone ever runs into the same problem I did.
I changed my query so that I became:
var group = collection.Aggregate()
.Match(filter)
.Group(new BsonDocument { { "_id", new BsonDocument { { "month", new BsonDocument("$month", "$createddate.DateTime") }, { "day", new BsonDocument("$dayOfMonth", "$createddate.DateTime") }, { "year", new BsonDocument("$year", "$createddate.DateTime") } } }, { "count", new BsonDocument("$sum", 1) } })
.ToListAsync().Result;
This gave me a serialized object. Then I deserialized it into the custom class I had:
var grouped = group.Select(g => BsonSerializer.Deserialize<RootObject>(g));
Here is the custom class definition which will be a bit polished-up:
public class Id
{
public int month { get; set; }
public int day { get; set; }
public int year { get; set; }
}
public class RootObject
{
public Id _id { get; set; }
public int count { get; set; }
}
I hope this will be useful. Thanks! :)

Related

Using elasticsearch NEST on a remote index

This is probably just me completely overseeing the obvious "missing link". Here goes anyway: I have an elasticsearch end-point http://distribution.virk.dk/cvr-permanent/virksomhed/_search, and I would like to query on this end-point. Fairly simple.
As I understand, NEST gives you the ability to strongly type the interaction with the elasticsearch index, in much the same way as Visual Studio will create types for a asmx/svc when you add a reference to the respective service.
So my question is: how on earth do I get from knowing the end-point for an elasticsearch index to having types matching the index and making queries on the index? I pressume the answer is: "Use NEST!", but all tutorials I've been able to find assume you have a local index that you generate from a c#-type, which will then give you a type to use in your queries. But what to do, when it's a "remote" index that you have to build your types from?
Thanks in advance for any answer pointing in the right direction!
UPDATE:
I have retrieved the mappings in the index, which I have reduced to only the field "cvrNummer" in the following:
{
"cvr-permanent-prod-20170205" : {
"mappings" : {
"virksomhed" : {
"_size" : {
"enabled" : true
},
"properties" : {
"Vrvirksomhed" : {
"properties" : {
"type" : "long"
},
"cvrNummer" : {
"type" : "string"
},
}
}
},
}
}
}
}
}
I have then made the following class:
[ElasticsearchType(Name = "virksomhed")]
public class Company
{
[Text(Name = "cvrNummer")]
public string cvrNumber { get; set; }
}
Now, all that I want to do (to begin with) is to search for documents having cvrNummer with a certain value, f. ex. "12883404". I have the following code in a simple console application:
var node = new Uri("http://distribution.virk.dk/cvr-
permanent/virksomhed/_search");
var settings = new ConnectionSettings(node).DefaultIndex("defaultindex");
settings.BasicAuthentication("username", "password");
var client = new ElasticClient(settings);
I then try the following very simple request:
var searchResponse = client.Search<Company>(s => s
.Type<Company>()
.Query(q => q
.Match(m => m
.Field(f => f.cvrNumber)
.Query("12883404")
And I get "400 bad request". What on earth am I doing wrong?

Basically you create a C# class with the properties you need by hand, then tell nest to map the results to this class.
using Nest;
using System;
[ElasticsearchType(Name = "Name_Of_The_Mapping_In_Index_Mappings")]
public class MySearchType {
[Text(Name = "_id")]
public string Id { get; set; }
[Date(Name = "#timestamp")]
public DateTime Timestamp { get; set; }
[Number(NumberType.Long, Name = "some_numeric_property_in_the_mapping")]
public long SomeNumericProperty { get; set; }
}
Then you can type your results to the search type you just defined:
Task<ISearchResponse<MySearchType>> response = await _elasticClient.SearchAsync<MySearchType>(s => s
.Index("Name_Of_The_Index")
.Type<MySearchType>()
.Query(q =>
q.Bool(bo =>
bo.Filter(
f => f.Terms(t =>
t.Field(searchtype => searchtype.SomeNumericProperty).Terms(request.NumericInput)),
/* ... */
)
)
)
);
IReadOnylCollection<MySearchType> result = response.Documents;
This explains how you can retrieve the names needed to create the binding: Get all index and types' names from cluster in ElasticSearch.

Serializing "string list" to JSON in C#

(I'v restated my question here: Creating class instances based on dynamic item lists)
I'm currently working on a program in Visual Studio 2015 with C#.
I have 5 list strings that contain data that I wish to serialize to a json file.
public List<string> name { get; private set; }
public List<string> userImageURL { get; private set; }
public List<string> nickname { get; private set; }
public List<string> info { get; private set; }
public List<string> available { get; private set; }
An example of the desired json file format is the fallowing:
{
"users" :
[
{
"name" : "name1",
"userImageURL" : "userImageURL1",
"nickname" : "nickname1",
"info" : "info1",
"available" : false,
},
{
"name" : "name2",
"userImageURL" : "userImageURL2",
"nickname" : "nickname2",
"info" : "info2",
"available" : false,
},
{
"name" : "name3",
"userImageURL" : "userImageURL3",
"nickname" : "nickname3",
"info" : "info3",
"available" : false,
},
{
"name" : "name4",
"userImageURL" : "userImageURL4",
"nickname" : "nickname4",
"info" : "info4",
"available" : false,
}
]
}
Note that there might be errors in the json example above.
I've tried combining the 5 lists to create 1 list to serialize it using the following code:
users = new List<string>(name.Count + userImageURL.Count + nickname.Count + info.Count + available.Count);
allPlayers.AddRange(name);
allPlayers.AddRange(userImageURL);
allPlayers.AddRange(nickname);
allPlayers.AddRange(info);
allPlayers.AddRange(available);
Then I serialize the list with the fallowing code:
string data = JsonConvert.SerializeObject(users);
File.WriteAllText("data.json", data);
This just creates an array of unorganized objects. I wish to know how can I organize them as expressed in the format above.
PS: I'm pretty new to coding as you can tell. Sorry if I'm not expressing the question correctly or using the right terminology. Also, this is not the original code. The code creates this lists which I wish to serialize into a json file.
PSS: This data is collected using HtmlAgilityPack. I asked a question yesterday asking how could I parse an html file and serialize it's data to a json file. Using HtmlAgilityPack to get specific data in C# and serialize it to json . As nobody answered, I decided to try and do it myself. The method that I used may not be the best, but it is what I could do with the knowledge that I have.

I would suggest refactoring your code to start with - instead of having 5 "parallel collections", have a single collection of a new type, User:
public class User
{
public string Name { get; set; }
public string ImageUrl { get; set; }
public string NickName { get; set; }
public string Info { get; set; }
public bool Available { get; set; }
}
...
// In your containing type
public List<User> Users { get; set; }
This is likely to make life simpler not just for your JSON, but for the rest of the code too - because you no longer have the possibility of having more nicknames than image URLs, etc. In general, having multiple collections that must be kept in sync with each other is an antipattern. There are times where it's appropriate - typically providing different efficient ways of retrieving the same data - but for something like this it's best avoided.

It is actually what Jon says, but to move from your parallel lists to Jon's single list you need something like (assuming all lists have the same number of elements in the same order):
Users = new List<User>();
for (var i = 0; i < name.Count; i++)
{
Users.Add(new User
{
Available = available[i],
ImageUrl = userImageURL[i],
Info = info[i],
Name = name[i],
NickName = nickname[i]
});
}
And then serialise the Users list.

Create index with multi field mapping syntax with NEST 2.x

I just can't seem to get the syntax correct for multi field mapping in NEST 2.0--if that's the correct terminology. Every example I've found for mapping seems to be <= the 1.x version of NEST. I'm new to Elasticsearch and NEST, and I've been reading their documentation, but the NEST documentation hasn't been completely updated for 2.x.
Basically, I don't need to index or store the entire type. Some fields I need for indexing only, some fields I'll need to index and retrieve, and some I don't need for indexing, just for retrieval.
MyType
{
// Index this & allow for retrieval.
int Id { get; set; }
// Index this & allow for retrieval.
// **Also**, in my searching & sorting, I need to sort on this **entire** field, not just individual tokens.
string CompanyName { get; set; }
// Don't index this for searching, but do store for display.
DateTime CreatedDate { get; set; }
// Index this for searching BUT NOT for retrieval/displaying.
string CompanyDescription { get; set; }
// Nest this.
List<MyChildType> Locations { get; set; }
}
MyChildType
{
// Index this & allow for retrieval.
string LocationName { get; set; }
// etc. other properties.
}
I've have been able to index the entire object and child as-is using the following as an example:
client.Index(item, i => i.Index(indexName));
However, the actual object is a lot larger than this, and I really don't need most of it. I've found this, which looks like what I think I want to do, but in an older version: multi field mapping elasticsearch
I think "mapping" is what I'm going for, but like I said, I'm new to Elasticsearch and NEST and I'm trying to learn the terminology.
Be gentle! :) It's my first time to ask a question on SO. Thanks!

In addition to Colin's and Selçuk's answers, you can also fully control the mapping through the fluent (and object initializer syntax) mapping API. Here's an example based on your requirements
void Main()
{
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var connectionSettings = new ConnectionSettings(pool);
var client = new ElasticClient(connectionSettings);
client.Map<MyType>(m => m
.Index("index-name")
.AutoMap()
.Properties(p => p
.String(s => s
.Name(n => n.CompanyName)
.Fields(f => f
.String(ss => ss
.Name("raw")
.NotAnalyzed()
)
)
)
.Date(d => d
.Name(n => n.CreatedDate)
.Index(NonStringIndexOption.No)
)
.String(s => s
.Name(n => n.CompanyDescription)
.Store(false)
)
.Nested<MyChildType>(n => n
.Name(nn => nn.Locations.First())
.AutoMap()
.Properties(pp => pp
/* properties of MyChildType */
)
)
)
);
}
public class MyType
{
// Index this & allow for retrieval.
public int Id { get; set; }
// Index this & allow for retrieval.
// **Also**, in my searching & sorting, I need to sort on this **entire** field, not just individual tokens.
public string CompanyName { get; set; }
// Don't index this for searching, but do store for display.
public DateTime CreatedDate { get; set; }
// Index this for searching BUT NOT for retrieval/displaying.
public string CompanyDescription { get; set; }
// Nest this.
public List<MyChildType> Locations { get; set; }
}
public class MyChildType
{
// Index this & allow for retrieval.
public string LocationName { get; set; }
// etc. other properties.
}
This produces the mapping
{
"properties": {
"id": {
"type": "integer"
},
"companyName": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"createdDate": {
"type": "date",
"index": "no"
},
"companyDescription": {
"type": "string",
"store": false
},
"locations": {
"type": "nested",
"properties": {
"locationName": {
"type": "string"
}
}
}
}
}
Calling .AutoMap() causes NEST to infer the mapping based on the property types and any attributes applied to them. Then .Properties() overrides any of the inferred mappings. For example
CompanyName is mapped as a multi_field with the field companyName analyzed using the standard analyzer and companyName.raw not analyzed. You can reference the latter in your queries using .Field(f => f.CompanyName.Suffix("raw"))
Locations is mapped as a nested type (automapping by default would infer this as an object type mapping). You can then define any specific mappings for MyChildType using .Properties() inside of the Nested<MyChildType>() call.

As far as I can see, you don't have any complex types that you are trying map. So you can easily use NEST attributes to map your objects.
Check this out:
[Nest.ElasticsearchType]
public class MyType
{
// Index this & allow for retrieval.
[Nest.Number(Store=true)]
int Id { get; set; }
// Index this & allow for retrieval.
// **Also**, in my searching & sorting, I need to sort on this **entire** field, not just individual tokens.
[Nest.String(Store = true, Index=Nest.FieldIndexOption.Analyzed, TermVector=Nest.TermVectorOption.WithPositionsOffsets)]
string CompanyName { get; set; }
// Don't index this for searching, but do store for display.
[Nest.Date(Store=true, Index=Nest.NonStringIndexOption.No)]
DateTime CreatedDate { get; set; }
// Index this for searching BUT NOT for retrieval/displaying.
[Nest.String(Store=false, Index=Nest.FieldIndexOption.Analyzed)]
string CompanyDescription { get; set; }
[Nest.Nested(Store=true, IncludeInAll=true)]
// Nest this.
List<MyChildType> Locations { get; set; }
}
[Nest.ElasticsearchType]
public class MyChildType
{
// Index this & allow for retrieval.
[Nest.String(Store=true, Index = Nest.FieldIndexOption.Analyzed)]
string LocationName { get; set; }
// etc. other properties.
}
After this declaration, to create this mapping in elasticsearch you need to make a call similar to:
var mappingResponse = elasticClient.Map<MyType>(m => m.AutoMap());
With AutoMap() call NEST will read your attributes from your POCO and create a mapping request accordingly.
Also see "Attribute Based Mapping" section from here.
Cheers!

At the time of writing, Nest does not offer a way to map a property in your class to multiple fields in your document mapping using built in attributes. However, it does provide the facilities needed to do anything with your mappings that you could do if you wrote the JSON yourself.
Here's a solution I've put together for my own needs. It shouldn't be hard to use it as the starting point for whatever you need to do.
First, here's an example of the mapping I want to generate
{
"product":{
"properties":{
"name":{
"type":"string",
"index":"not_analyzed",
"fields":{
"standard":{
"type":"string",
"analyzer":"standard"
}
}
}
}
}
}
The product document would then have the name field, which is indexed but not analyzed, and the name.standard field, which uses the standard analyzer.
The C# class that I generate the mapping from looks like this
[ElasticsearchType]
public class Product
{
[WantsStandardAnalysisField]
public string Name { get; set; }
}
Note the WantsStandardAnalysisField attribute. That's a custom attribute with no special properties added. Literally just:
public class WantsStandardAnalysisField : Attribute {}
If I were to use AutoMap as-is, my custom attribute would be ignored and I would get a mapping that has the name field, but not name.standard. Luckily, AutoMap accepts an instance of IPropertyVisitor. A base class called NoopPropertyVisitor implements the interface and does nothing at all, so you can subclass it and override only the methods you care about. When you use a property visitor with AutoMap, it will generate a document mapping for you but give you a chance to modify it before it gets sent to Elastic Search. All we need to do is look for properties marked with our custom attribute and add a field to them.
Here's an example that does that:
public class ProductPropertyVisitor : NoopPropertyVisitor
{
public override void Visit(IStringProperty type, PropertyInfo propertyInfo, ElasticsearchPropertyAttributeBase attribute)
{
base.Visit(type, propertyInfo, attribute);
var wsaf = propertyInfo.GetCustomAttribute<WantsStandardAnalysisField>();
if (wsaf != null)
{
type.Index = FieldIndexOption.NotAnalyzed;
type.Fields = new Properties
{
{
"standard",
new StringProperty
{
Index = FieldIndexOption.Analyzed,
Analyzer = "standard"
}
}
};
}
}
}
As you can see, we can do pretty much anything we want with the generated property, including turning off analysis for the main property and adding a new field with its own settings. For fun, you could add a couple properties to the custom attribute allowing you to specify the name of the field you want and the analyzer to use. You could even modify the code to see if the attribute has been added multiple times, letting you add as many fields as you want.
If you were to run this through any method that generates a mapping using AutoMap, such as:
new TypeMappingDescriptor<Product>().AutoMap(new ProductPropertyVisitor())
You'll get the desired multi-field mapping. Now you can customize mappings to your heart's content. Enjoy!

I think you have at least 2 possibilities to solve your problem:
On indexing: Create something like a metadata model, which is stored just for retrieving. See the _source field to limit the return to this field.
On searching: Specify the fields you want to query: if you don`t want to query the CreatedDate, just don't include it in your search.
In my case I am using both of these approaches to get very fast results :-)

In MongoDb, how can you set a value on an object in a array property?

My goal is to put a "deleted at" timestamp on specific object in an array of a document.
If the document looks like this:
{
"subdoc": [
{
"key": 1,
"value": "abc",
"isActive": true
},
{
"key":5,
"value": "ade",
"isActive":true
}
]
}
I would like to be able to say "look for the document that has subdoc.key == 5 and subdoc.value == "ade"; set subdoc.isActive to false and set subdoc.deleteAt = current db timestamp. With a resulting document like this:
{
"subdoc": [
{
"key": 1,
"value": "abc",
"isActive": true
},
{
"key":5,
"value": "ade",
"isActive": false,
"deletedAt": Timestamp(1425911075,1)
}
]
}
Is this doable?
Update: After further review of the mongo docs, this does seem doable with the "$ (update)" operator. That gets me what I need, but I'm hoping for a less magical-strings way of doing this using the C# driver?
My working find/update looks like this:
// find
{
"subdoc.key":"2",
"subdoc.value":"ade"
}
// update
{
"$currentDate": {
"subdoc.$.deleteAt": {
"$type": "timestamp"
}
}
}
Update: I should clarify that this updated time stamp field is used for synchronization by many sometimes-connected mobile clients in a load-balanced environment (multiple web servers, multiple worker processes, and a mongo cluster) with a high transaction volume, which makes it crucial that this time stamp has a single point truth, is logically sequential in the context of the app, and is as high precision as possible (fractions of a second). Otherwise, records could be missed in a sync.
For the moment, I'm using the above approach to ensure time stamped values are generated by the mongo database instance. And I'm pretty satisfied with this approach.

you can use the c# driver to wrap the mongo entities in c# objects. Then in your code you can use linq to query the DB and update your objects as required. Then just save them to the DB to persist your changes.
Below is a small piece of code to query a Parent collection in the test DB.
The C# driver provides as AsQueryable extension to allow us to write our queries directly in Linq. The driver will then automatically build the required query and execute it against the collection.
The sample below looks for any sub documents in the subdoc list that have a value on the key field of 5
If it finds any, it updates the deletedAt date and then saves it back to the DB.
var client = new MongoClient();
var database = client.GetServer().GetDatabase("test");
var parentCollection = database.GetCollection<Parent>("Parent");
var parent = parentCollection.AsQueryable().FirstOrDefault(p => p.subdoc.Any(f => f.key == 5));
if (parent != null)
{
var fooList = parent.subdoc.Where(f => f.key == 5);
foreach (var foo in fooList)
{
foo.deletedAt = DateTime.UtcNow;
}
}
parentCollection.Save(parent);
Below are the two c# entities used to map to the Mongo documents. We can use the [BsonIgnoreIfNull] attribute of the mongo c# driver to only serialize the deletedAt field if it contains a value. We also use a nullable DateTime object in our code to allow nulls to be stores if required.
public class Foo
{
[BsonId]
public ObjectId Id { get; set; }
public int key { get; set; }
public string value { get; set; }
public bool isActive { get; set; }
[BsonIgnoreIfNull]
public DateTime? deletedAt { get; set; }
}
public class Parent
{
[BsonId]
public ObjectId Id { get; set; }
public List<Foo> subdoc { get; set; }
}

See most recent update. A combination of the positional and $currentDate operators is serving my purpose.

Serialize/Deserialize Json DateTime in Neo4jClient

I use Neo4jClient to use Neo4j, I use cypher code for CRUD entity , Follow code :
_graphClient.Cypher.Merge("(n:Movie { Id:101 })")
.Set("n.Key = 55,n.DateTime='" +DateTime.UtcNow.ToString()+"'").ExecuteWithoutResults();
_graphClient.Cypher
.Match("(n:Movie)-[r:RelName]-(m:Movie)")
.Where((EntityNode n) => n.Id == 20)
.Return.......
public class EntityNode
{
public int Id { get; set; }
public string Key { get; set; }
public DateTime DateTime { get; set; }
}
ERROR :Neo4j returned a valid response, however Neo4jClient was unable to deserialize into the object structure you supplied.Can't deserialize DateTime.
On other hand i use jsonconvertor in different ways, for example :
_graphClient.Cypher.Merge("(n:Movie { Id:101 })")
.Set("n.Key = 55,n.DateTime=" +JsonConvert.SerializeObject(DateTime.UtcNow)).ExecuteWithoutResults();
I still have the ERROR

Pass it as a proper parameter:
graphClient.Cypher
.Merge("(n:Movie { Id:101 })")
.Set("n.Key = {key}, n.DateTime = {time}")
.WithParams(new {
key = 55,
time = DateTimeOffset.UtcNow
})
.ExecuteWithoutResults();
This way, Neo4jClient will do the serialization for you, and you don't introduce lots of security and performance issues.
This is in the doco here: https://github.com/Readify/Neo4jClient/wiki/cypher#parameters

I have faced the same issue recently its because of date time value coming from neo.
I have stored the date time in neo as epoch time but while retrieving i used long in the class. because of this its given me the above error.
Try using string for the above date time.
Hope this helps you.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Aggregating By Date in Mongodb - c#

Related

Using elasticsearch NEST on a remote index

Serializing "string list" to JSON in C#

Create index with multi field mapping syntax with NEST 2.x

In MongoDb, how can you set a value on an object in a array property?

Serialize/Deserialize Json DateTime in Neo4jClient

Categories

Resources