Aggregation with MongoDB C# Driver [duplicate] - c#

Is there a query for calculating how many distinct values a field contains in DB.
f.e I have a field for country and there are 8 types of country values (spain, england, france, etc...)
If someone adds more documents with a new country I would like the query to return 9.
Is there easier way then group and count?

MongoDB has a distinct command which returns an array of distinct values for a field; you can check the length of the array for a count.
There is a shell db.collection.distinct() helper as well:
> db.countries.distinct('country');
[ "Spain", "England", "France", "Australia" ]
> db.countries.distinct('country').length
4
As noted in the MongoDB documentation:
Results must not be larger than the maximum BSON size (16MB). If your results exceed the maximum BSON size, use the aggregation pipeline to retrieve distinct values using the $group operator, as described in Retrieve Distinct Values with the Aggregation Pipeline.

Here is example of using aggregation API. To complicate the case we're grouping by case-insensitive words from array property of the document.
db.articles.aggregate([
{
$match: {
keywords: { $not: {$size: 0} }
}
},
{ $unwind: "$keywords" },
{
$group: {
_id: {$toLower: '$keywords'},
count: { $sum: 1 }
}
},
{
$match: {
count: { $gte: 2 }
}
},
{ $sort : { count : -1} },
{ $limit : 100 }
]);
that give result such as
{ "_id" : "inflammation", "count" : 765 }
{ "_id" : "obesity", "count" : 641 }
{ "_id" : "epidemiology", "count" : 617 }
{ "_id" : "cancer", "count" : 604 }
{ "_id" : "breast cancer", "count" : 596 }
{ "_id" : "apoptosis", "count" : 570 }
{ "_id" : "children", "count" : 487 }
{ "_id" : "depression", "count" : 474 }
{ "_id" : "hiv", "count" : 468 }
{ "_id" : "prognosis", "count" : 428 }

With MongoDb 3.4.4 and newer, you can leverage the use of $arrayToObject operator and a $replaceRoot pipeline to get the counts.
For example, suppose you have a collection of users with different roles and you would like to calculate the distinct counts of the roles. You would need to run the following aggregate pipeline:
db.users.aggregate([
{ "$group": {
"_id": { "$toLower": "$role" },
"count": { "$sum": 1 }
} },
{ "$group": {
"_id": null,
"counts": {
"$push": { "k": "$_id", "v": "$count" }
}
} },
{ "$replaceRoot": {
"newRoot": { "$arrayToObject": "$counts" }
} }
])
Example Output
{
"user" : 67,
"superuser" : 5,
"admin" : 4,
"moderator" : 12
}

I wanted a more concise answer and I came up with the following using the documentation at aggregates and group
db.countries.aggregate([{"$group": {"_id": "$country", "count":{"$sum": 1}}}])

You can leverage on Mongo Shell Extensions. It's a single .js import that you can append to your $HOME/.mongorc.js, or programmatically, if you're coding in Node.js/io.js too.
Sample
For each distinct value of field counts the occurrences in documents optionally filtered by query
> db.users.distinctAndCount('name', {name: /^a/i})
{
"Abagail": 1,
"Abbey": 3,
"Abbie": 1,
...
}
The field parameter could be an array of fields
> db.users.distinctAndCount(['name','job'], {name: /^a/i})
{
"Austin,Educator" : 1,
"Aurelia,Educator" : 1,
"Augustine,Carpenter" : 1,
...
}

To find distinct in field_1 in collection but we want some WHERE condition too than we can do like following :
db.your_collection_name.distinct('field_1', {WHERE condition here and it should return a document})
So, find number distinct names from a collection where age > 25 will be like :
db.your_collection_name.distinct('names', {'age': {"$gt": 25}})
Hope it helps!

I use this query:
var collection = "countries"; var field = "country";
db[collection].distinct(field).forEach(function(value){print(field + ", " + value + ": " + db[collection].count({[field]: value}))})
Output:
countries, England: 3536
countries, France: 238
countries, Australia: 1044
countries, Spain: 16
This query first distinct all the values, and then count for each one of them the number of occurrences.

If you're on MongoDB 3.4+, you can use $count in an aggregation pipeline:
db.users.aggregate([
{ $group: { _id: '$country' } },
{ $count: 'countOfUniqueCountries' }
]);

Related

C# MongoDB query: filter based on the last item of array

I have a MongoDB collection like this:
{
_id: "abc",
history:
[
{
status: 1,
reason: "confirmed"
},
{
status: 2,
reason: "accepted"
}
],
_id: "xyz",
history:
[
{
status: 2,
reason: "accepted"
},
{
status: 10,
reason: "cancelled"
}
]
}
I want to write a query in C# to return the documents whose last history item is 2 (accepted). So in my result I should not see "xyz" because its state has changed from 2, but I should see "abc" since its last status is 2. The problem is that getting the last item is not easy with MongoDB's C# driver - or I don't know how to.
I tried the linq's lastOrDefault but got System.InvalidOperationException: {document}{History}.LastOrDefault().Status is not supported error.
I know there is a workaround to get the documents first (load to memory) and then filter, but it is client side and slow (consumes lot of network). I want to do the filter on server.
Option 1) Find() -> expected to be faster
db.collection.find({
$expr: {
$eq: [
{
$arrayElemAt: [
"$history.status",
-1
]
},
2
]
}
})
Playground1
Option 2) Aggregation
db.collection.aggregate([
{
"$addFields": {
last: {
$arrayElemAt: [
"$history",
-1
]
}
}
},
{
$match: {
"last.status": 2
}
},
{
$project: {
"history": 1
}
}
])
Playground2
I found a hackaround: to override the history array with the last history document, then apply the filter as if there was no array. This is possible through Aggregate operation $addFields.
PipelineDefinition<Process, BsonDocument> pipeline = new BsonDocument[]
{
new BsonDocument("$addFields",
new BsonDocument("history",
new BsonDocument ( "$slice",
new BsonArray { "$history", -1 }
)
)
),
new BsonDocument("$match",
new BsonDocument
{
{ "history.status", 2 }
}
)
};
var result = collection.Aggregate(pipeline).ToList();
result will be the documents with last history of 2.

How to check only top 1 in nested mongo array

at the moment my notification documents has an events property which is an array of event. Each event has a status and a date. When querying notifications, it needs to check if the top status is opened.
Valid object where most recent event status is opened -
{
"subject" : "Hello there",
"events" : [
{
"status" : "opened",
"date" : 2020-01-02 17:35:31.229Z
},
{
"status" : "clicked",
"date" : 2020-01-01 17:35:31.229Z
},
]
}
Invalid object where status isn't most recent
{
"subject" : "Hello there",
"events" : [
{
"status" : "opened",
"date" : 2020-01-01 17:35:31.229Z
},
{
"status" : "clicked",
"date" : 2020-01-02 17:35:31.229Z
},
]
}
At the moment I have the query that can check if any event has the status opened, but I'm unsure how to query only the top 1 and sorted by the dates of a nested query. Any help would be greatly appreciated.
var filter = Builders<Notification>.Filter.Empty;
filter &= Builders<Notification>.Filter.Regex("events.event", new BsonRegularExpression(searchString, "i"));
var results = await collection.FindSync(filter, findOptions).ToListAsync();
In order to get only the latest event you can use $reduce to iterate over the events and compare each one to the temporarily latest:
db.collection.aggregate([
{
$addFields: {
latestEvent: {
$reduce: {
input: "$events",
initialValue: {status: null, date: 0},
in: {
$mergeObjects: [
"$$value",
{
$cond: [
{
$gt: [{$toDate: "$$this.date"}, {$toDate: "$$this.value"}]
},
"$$this",
"$$value"
]
}
]
}
}
}
}
}
])
See how it works on the playground example
for multiple documents, the result return only correct documents
example
db.collection.aggregate([{
$addFields: {
lastevent: {
$filter: {
input: '$events',
as: 'element',
cond: {$eq: ['$$element.date',{$max: '$events.date'}]}
}
}
}
}, {
$match: {
'lastevent.status': 'opened'
}
}])
I am a fan of not using an axe for everything, even if it is a good one :)
So i take it the events being disorderly is a rare thing, so we don't need to spend a lot of resources to weed out those up front as they will be few.
So my take is to get all the opened ones and use simple .net iteration to remove the few that may be, leaving a nice and orderly and easily maintainable method.
public List<Notification> GetValidSubjectStatusList(IMongoCollection<Notification> mongoCollection){
var builder = Builders<Notification>.Filter;
var filter = builder.Eq(x => x.Events.FirstOrDefault().Status, "opened");
var listOf = mongoCollection.Find(filter).ToList();
var reducedList = new List<Notification>();
foreach(var hit in listOf){
if(hit.Events.Any()
&& hit.Events.First()
.Date.Equals(hit.Events
.OrderByDescending(x => x.Date)
.FirstOrDefault()
))
{
reducedList.Add(hit);
}
}
return reducedList;
}

Find MongoDB document and only matching array elements w/ C# driver

I'm trying to return a document, and that document should have it's array filtered such that it only contains one item. I've seen many similar questions, but none of them deal with dynamic queries. There may be several constraints so I have to be able to keep adding to the filter.
{
"_id" : ObjectId("6058f722e9e41a3d243258dc"),
"fooName" : "foo1",
"fooCode" : 1,
"bar" : [
{
"barCode" : "123",
"barName" : "Rick's Cafe",
"baz" : [
{
"bazId" : "00",
"bazDescription" : "Ilsa"
},
{
"bazId" : "21",
"bazDescription" : "Victor"
}
]
},
{
"barCode" : "456",
"barName" : "Draco Tavern",
"baz" : [
{
"bazId" : "00",
"bazDescription" : "Rick Shumann"
}
]
}
]
}
This is my attempt, it returns a document who's array contains the barCode, and the array's entire contents are included.
Expression<Func<Foo, bool>> filter = x => x.FooCode == 1;
string barCode = "456"
if (!String.IsNullOrEmpty(barCode))
{
Expression<Func<Foo, bool>> newPred =
x => x.Bar.Any(s => s.BarCode == barCode);
filter = filter.CombineAnd(newPred);
}
var fooQuery =
_fooCollection
.Find(filter);
How do I remove non-matching array elements, but only if an array element was specified?
Unwind to convert the single document into a document per nested-array element in the shape of:
{
"_id" : ObjectId("6058f722e9e41a3d243258dc"),
"fooName" : "foo1",
"fooCode" : 1,
"bar": {
"barCode" : "123",
"barName" : "Rick's Cafe",
...
}
}
Match to find the element you want
Group to recombine the nested-array
So the resulting C# might look like:
var fooQuery = _fooCollection.Aggregate()
.Unwind("bar")
.Match(BsonDocument.Parse("{ 'bar.barcode': '"+ barCode + "'}"))
.Group(BsonDocument.Parse("{​​​​​ '_id':'$fooCode' }​​​​​"))
You need to use aggregate in MongoDB.
You can split the array elements with unwind, filter with match, select the keys that you want with project and group with common column like id or something.
MongoDB Aggregation docs: https://docs.mongodb.com/manual/aggregation/

MongoDB C# Driver Update Document with Aggregation Pipeline

As described here from MongoDB 4.2 on it is possible to update documents with an aggregation pipeline.
It means that now it is possible to express "conditional updates based on current field values or updating one field using the value of another field(s)".
For instance:
db.members.update(
{ },
[
{ $set: { status: "Modified", comments: [ "$misc1", "$misc2" ], lastUpdate: "$$NOW" } },
{ $unset: [ "misc1", "misc2" ] }
],
{ multi: true }
)
My question is: how can I do this using MongoDB on C#?
IMongoCollection's UpdateMany takes UpdateDefinition<T> as second parameter and PipelineUpdateDefinition is one of the derived classes. There's no expression trees support so far but you can utilize BsonDocument class:
IMongoCollection<BsonDocument> col = ...;
var pipeline = new EmptyPipelineDefinition<BsonDocument>()
.AppendStage("{ $addFields : { " +
"status : 'Modified'," +
"comments: [ '$misc1', '$misc2' ]," +
"lastUpdate: '$$NOW' " +
"} }",
BsonDocumentSerializer.Instance)
.AppendStage("{ $project : { 'misc1':0, 'misc2':0 } }",
BsonDocumentSerializer.Instance);
col.UpdateMany(new BsonDocument(), pipeline);
which executes following command (trace from MongoDB driver):
"updates" : [
{
"q" : { },
"u" : [
{ "$addFields" : { "status" : "Modified", "comments" : ["$misc1", "$misc2"], "lastUpdate" : "$$NOW" } },
{ "$project" : { "misc1" : 0, "misc2" : 0 } }],
"multi" : true }
}
]

how to update mongo db all records in c#

I have mongo db collection which stores the JSON.
By mistakenly, one element value was updated wrong in all records of a collection.
How i will update the particular element ?
My json is like
{
status:
{
name:"john",
value: "12345678903333.444"
}
}
here the value property value should be long field, value will be replaced by
{
status:
{
"name":"john",
"value": 1234567890
}
}
value should be trimmed as first 10 character of existing value.
After updating(from #mickl answer),
Converting to Int also got error!
You can use $substr operator with $toDouble to convert string to number and then redirect aggregation results into the same collection using $out (which will basically update all its documents), Try in Mongo shell:
db.col.aggregate([
{
$addFields: {
"status.value": { $toDouble: { $substr: [ "$status.value", 0, 10 ] } }
}
},
{
$out: "col"
}
])
Or in C# code:
var addFieldsBody = "{ $addFields: { \"status.value\": { $toDouble: { $substr: [ \"$status.value\", 0, 10 ] } } } }";
Col.Aggregate()
.AppendStage<BsonDocument>(BsonDocument.Parse(addFieldsBody))
.Out("col");

Categories