Related
at the moment my notification documents has an events property which is an array of event. Each event has a status and a date. When querying notifications, it needs to check if the top status is opened.
Valid object where most recent event status is opened -
{
"subject" : "Hello there",
"events" : [
{
"status" : "opened",
"date" : 2020-01-02 17:35:31.229Z
},
{
"status" : "clicked",
"date" : 2020-01-01 17:35:31.229Z
},
]
}
Invalid object where status isn't most recent
{
"subject" : "Hello there",
"events" : [
{
"status" : "opened",
"date" : 2020-01-01 17:35:31.229Z
},
{
"status" : "clicked",
"date" : 2020-01-02 17:35:31.229Z
},
]
}
At the moment I have the query that can check if any event has the status opened, but I'm unsure how to query only the top 1 and sorted by the dates of a nested query. Any help would be greatly appreciated.
var filter = Builders<Notification>.Filter.Empty;
filter &= Builders<Notification>.Filter.Regex("events.event", new BsonRegularExpression(searchString, "i"));
var results = await collection.FindSync(filter, findOptions).ToListAsync();
In order to get only the latest event you can use $reduce to iterate over the events and compare each one to the temporarily latest:
db.collection.aggregate([
{
$addFields: {
latestEvent: {
$reduce: {
input: "$events",
initialValue: {status: null, date: 0},
in: {
$mergeObjects: [
"$$value",
{
$cond: [
{
$gt: [{$toDate: "$$this.date"}, {$toDate: "$$this.value"}]
},
"$$this",
"$$value"
]
}
]
}
}
}
}
}
])
See how it works on the playground example
for multiple documents, the result return only correct documents
example
db.collection.aggregate([{
$addFields: {
lastevent: {
$filter: {
input: '$events',
as: 'element',
cond: {$eq: ['$$element.date',{$max: '$events.date'}]}
}
}
}
}, {
$match: {
'lastevent.status': 'opened'
}
}])
I am a fan of not using an axe for everything, even if it is a good one :)
So i take it the events being disorderly is a rare thing, so we don't need to spend a lot of resources to weed out those up front as they will be few.
So my take is to get all the opened ones and use simple .net iteration to remove the few that may be, leaving a nice and orderly and easily maintainable method.
public List<Notification> GetValidSubjectStatusList(IMongoCollection<Notification> mongoCollection){
var builder = Builders<Notification>.Filter;
var filter = builder.Eq(x => x.Events.FirstOrDefault().Status, "opened");
var listOf = mongoCollection.Find(filter).ToList();
var reducedList = new List<Notification>();
foreach(var hit in listOf){
if(hit.Events.Any()
&& hit.Events.First()
.Date.Equals(hit.Events
.OrderByDescending(x => x.Date)
.FirstOrDefault()
))
{
reducedList.Add(hit);
}
}
return reducedList;
}
I'm trying to return a document, and that document should have it's array filtered such that it only contains one item. I've seen many similar questions, but none of them deal with dynamic queries. There may be several constraints so I have to be able to keep adding to the filter.
{
"_id" : ObjectId("6058f722e9e41a3d243258dc"),
"fooName" : "foo1",
"fooCode" : 1,
"bar" : [
{
"barCode" : "123",
"barName" : "Rick's Cafe",
"baz" : [
{
"bazId" : "00",
"bazDescription" : "Ilsa"
},
{
"bazId" : "21",
"bazDescription" : "Victor"
}
]
},
{
"barCode" : "456",
"barName" : "Draco Tavern",
"baz" : [
{
"bazId" : "00",
"bazDescription" : "Rick Shumann"
}
]
}
]
}
This is my attempt, it returns a document who's array contains the barCode, and the array's entire contents are included.
Expression<Func<Foo, bool>> filter = x => x.FooCode == 1;
string barCode = "456"
if (!String.IsNullOrEmpty(barCode))
{
Expression<Func<Foo, bool>> newPred =
x => x.Bar.Any(s => s.BarCode == barCode);
filter = filter.CombineAnd(newPred);
}
var fooQuery =
_fooCollection
.Find(filter);
How do I remove non-matching array elements, but only if an array element was specified?
Unwind to convert the single document into a document per nested-array element in the shape of:
{
"_id" : ObjectId("6058f722e9e41a3d243258dc"),
"fooName" : "foo1",
"fooCode" : 1,
"bar": {
"barCode" : "123",
"barName" : "Rick's Cafe",
...
}
}
Match to find the element you want
Group to recombine the nested-array
So the resulting C# might look like:
var fooQuery = _fooCollection.Aggregate()
.Unwind("bar")
.Match(BsonDocument.Parse("{ 'bar.barcode': '"+ barCode + "'}"))
.Group(BsonDocument.Parse("{ '_id':'$fooCode' }"))
You need to use aggregate in MongoDB.
You can split the array elements with unwind, filter with match, select the keys that you want with project and group with common column like id or something.
MongoDB Aggregation docs: https://docs.mongodb.com/manual/aggregation/
I am pretty new to mongo DB and experimenting with it for one of our applications. We are trying to implement CQRS and query part we are trying to use node.js and command part we are implementing through c#.
One of my collections might have millions of documents in it. We would have a scenarioId field and each scenario can have around two million records.
Our use case is to compare these two scenarios data and do some mathematical operation on the each field of scenarios.
For example, each scenario can have a property avgMiles and I would like to compute the difference of this property and users should be able to filter on this difference value. As my design is to keep both scenarios data in single collection i am trying to do group by scenario id and further project it.
My sample structure of a document would look like below.
{
"_id" : ObjectId("5ac05dc58ff6cd3054d5654c"),
"origin" : {
"code" : "0000",
},
"destination" : {
"code" : "0001",
},
"currentOutput" : {
"avgMiles" : 0.15093020854848138,
},
"scenarioId" : NumberInt(0),
"serviceType" : "ECON"
}
When I group I would group it based on origin.code and destination.code and serviceType properties.
My aggregate pipeline query looks like this:
db.servicestats.aggregate([{$match:{$or:[{scenarioId:0}, {scenarioId:1}]}},
{$sort:{'origin.code':1,'destination.code':1,serviceType:1}},
{$group:{
_id:{originCode:'$origin.code',destinationCode:'$destination.code',serviceType:'$serviceType'},
baseScenarioId:{$sum:{$switch: {
branches: [
{
case: { $eq: [ '$scenarioId', 1] },
then: '$scenarioId'
}],
default: 0
}
}},
compareScenarioId:{$sum:{$switch: {
branches: [
{
case: { $eq: [ '$scenarioId', 0] },
then: '$scenarioId'
}],
default: 0
}
}},
baseavgMiles:{$max:{$switch: {
branches: [
{
case: { $eq: [ '$scenarioId', 1] },
then: '$currentOutput.avgMiles'
}],
default: null
}
}},
compareavgMiles:{$sum:{$switch: {
branches: [
{
case: { $eq: [ '$scenarioId', 0] },
then: '$currentOutput.avgMiles'
}],
default: null
}
}}
}
},
{$project:{scenarioId:
{ base:'$baseScenarioId',
compare:'$compareScenarioId'
},
avgMiles:{base:'$baseavgMiles', comapre:'$compareavgMiles',diff:{$subtract :['$baseavgMiles','$compareavgMiles']}}
}
},
{$match:{'avgMiles.diff':{$eq:0.5}}},
{$limit:100}
],{allowDiskUse: true} )
My group pipeline stage would have 4 million documents going in it. Can you please suggest how I can improve the performance of this query?
I have an index on the fields used in my group by condition and I have added a sort pipeline stage to help group by to perform better.
Any suggestions are most welcome.
As group by is not workin in my case i have implemented left outer join using $lookup and the query would look like below.
db.servicestats.aggregate([
{$match:{$and :[ {'scenarioId':0}
//,{'origin.code':'0000'},{'destination.code':'0001'}
]}},
//{$limit:1000000},
{$lookup: { from:'servicestats',
let: {ocode:'$origin.code',dcode:'$destination.code',stype:'$serviceType'},
pipeline:[
{$match: {
$expr: { $and:
[
{ $eq: [ "$scenarioId", 1 ] },
{ $eq: [ "$origin.code", "$$ocode" ] },
{ $eq: [ "$destination.code", "$$dcode" ] },
{ $eq: [ "$serviceType", "$$stype" ] },
]
}
}
},
{$project: {_id:0, comp :{compavgmiles :'$currentOutput.avgMiles'}}},
{ $replaceRoot: { newRoot: "$comp" } }
],
as : "compoutputs"
}},
{
$replaceRoot: {
newRoot: {
$mergeObjects:[
{
$arrayElemAt: [
"$$ROOT.compoutputs",
0
]
},
{
origin: "$$ROOT.origin",
destination: "$$ROOT.destination",
serviceType: "$$ROOT.serviceType",
baseavgmiles: "$$ROOT.currentOutput.avgMiles",
output: '$$ROOT'
}
]
}
}
},
{$limit:100}
])
the above query performance is good and returns in 70 ms.
But in my scenario i need a full outer join to be implemented which i understood mongo does not support as of now and implemented using $facet pipeline as below
db.servicestats.aggregate([
{$limit:1000},
{$facet: {output1:[
{$match:{$and :[ {'scenarioId':0}
]}},
{$lookup: { from:'servicestats',
let: {ocode:'$origin.code',dcode:'$destination.code',stype:'$serviceType'},
pipeline:[
{$match: {
$expr: { $and:
[
{ $eq: [ "$scenarioId", 1 ] },
{ $eq: [ "$origin.code", "$$ocode" ] },
{ $eq: [ "$destination.code", "$$dcode" ] },
{ $eq: [ "$serviceType", "$$stype" ] },
]
}
}
},
{$project: {_id:0, comp :{compavgmiles :'$currentOutput.avgMiles'}}},
{ $replaceRoot: { newRoot: "$comp" } }
],
as : "compoutputs"
}},
//{
// $replaceRoot: {
// newRoot: {
// $mergeObjects:[
// {
// $arrayElemAt: [
// "$$ROOT.compoutputs",
// 0
// ]
// },
// {
// origin: "$$ROOT.origin",
// destination: "$$ROOT.destination",
// serviceType: "$$ROOT.serviceType",
// baseavgmiles: "$$ROOT.currentOutput.avgMiles",
// output: '$$ROOT'
// }
// ]
// }
// }
// }
],
output2:[
{$match:{$and :[ {'scenarioId':1}
]}},
{$lookup: { from:'servicestats',
let: {ocode:'$origin.code',dcode:'$destination.code',stype:'$serviceType'},
pipeline:[
{$match: {
$expr: { $and:
[
{ $eq: [ "$scenarioId", 0 ] },
{ $eq: [ "$origin.code", "$$ocode" ] },
{ $eq: [ "$destination.code", "$$dcode" ] },
{ $eq: [ "$serviceType", "$$stype" ] },
]
}
}
},
{$project: {_id:0, comp :{compavgmiles :'$currentOutput.avgMiles'}}},
{ $replaceRoot: { newRoot: "$comp" } }
],
as : "compoutputs"
}},
//{
// $replaceRoot: {
// newRoot: {
// $mergeObjects:[
// {
// $arrayElemAt: [
// "$$ROOT.compoutputs",
// 0
// ]
// },
// {
// origin: "$$ROOT.origin",
// destination: "$$ROOT.destination",
// serviceType: "$$ROOT.serviceType",
// baseavgmiles: "$$ROOT.currentOutput.avgMiles",
// output: '$$ROOT'
// }
// ]
// }
// }
// },
{$match :{'compoutputs':{$eq:[]}}}
]
}
}
///{$limit:100}
])
But facet performance is very bad. Any further ideas to improve this are most welcome.
In general, there are three things that can cause slow queries:
The query is not indexed, cannot use indexes efficiently, or the schema design is not optimal (e.g. highly nested arrays or subdocuments) which means that MongoDB must do some extra work to arrive at the relevant data.
The query is waiting for something slow (e.g. fetching data from disk, writing data to disk).
Underprovisioned hardware.
In terms of your query, there may be some general suggestions regarding query performance:
Using allowDiskUse in an aggregation pipeline means that it is possible that the query will be using disk for some its stages. Disk is frequently the slowest part of a machine, so if it's possible for you to avoid this, it will speed up the query.
Note that an aggregation query is limited to 100MB memory use. This is irrespective of the amount of memory you have.
The $group stage cannot use indexes, because an index is tied to a document's location on disk. Once the aggregation pipeline enters a stage where the document's physical location is irrelevant (e.g. the $group stage), an index cannot be used anymore.
By default, the WiredTiger cache is ~50% of RAM, so a 64GB machine would have a ~32GB WiredTiger cache. If you find that the query is very slow, it is possible that MongoDB needed to go to disk to fetch the relevant documents. Monitoring iostats and checking disk utilization % during the query would provide hints toward whether enough RAM is provisioned.
Some possible solutions are:
Provision more RAM so that MongoDB doesn't have to go to disk very often.
Rework the schema design to avoid heavily nested fields, or multiple arrays in the document.
Tailor the document schema to make it easier for you to query the data in it, instead of tailoring the schema to how you think the data should be stored (e.g. avoid heavy normalization inherent in relational database design model).
If you find that you're hitting the performance limit of a single machine, consider sharding to horizontally scale the query. However, please note that sharding is a solution that would require careful design and consideration.
You are saying above that you'd like to group by scenarioId which, however, you don't. But that is probably what you should be doing to avoid all the switch statements. Something like this might get you going:
db.servicestats.aggregate([{
$match: {
scenarioId: { $in: [ 0, 1 ] }
}
}, {
$sort: { // not sure if that stage even helps - try to run with and without
'origin.code': 1,
'destination.code': 1,
serviceType: 1
}
}, {
$group: { // first group by scenarioId AND the other fields
_id: {
scenarioId: '$scenarioId',
originCode: '$origin.code',
destinationCode: '$destination.code',
serviceType: '$serviceType'
},
avgMiles: { $max: '$currentOutput.avgMiles' } // no switches needed
},
}, {
$group: { // group by the other fields only so without scenarioId
_id: {
originCode: '$_id.originCode',
destinationCode: '$_id.destinationCode',
serviceType: '$_id.serviceType'
},
baseScenarioAvgMiles: {
$max: {
$cond: {
if: { $eq: [ '$_id.scenarioId', 1 ] },
then: '$avgMiles',
else: 0
}
}
},
compareScenarioAvgMiles: {
$max: {
$cond: {
if: { $eq: [ '$_id.scenarioId', 0 ] },
then: '$avgMiles',
else: 0
}
}
}
},
}, {
$addFields: { // compute the difference
diff: {
$subtract :[ '$baseScenarioAvgMiles', '$compareScenarioAvgMiles']
}
}
}, {
$match: {
'avgMiles.diff': { $eq: 0.5 }
}
}, {
$limit:100
}], { allowDiskUse: true })
Beyond that I would suggest you use the power of db.collection.explain().aggregate(...) to find the right indexing and tune your query.
Is there a query for calculating how many distinct values a field contains in DB.
f.e I have a field for country and there are 8 types of country values (spain, england, france, etc...)
If someone adds more documents with a new country I would like the query to return 9.
Is there easier way then group and count?
MongoDB has a distinct command which returns an array of distinct values for a field; you can check the length of the array for a count.
There is a shell db.collection.distinct() helper as well:
> db.countries.distinct('country');
[ "Spain", "England", "France", "Australia" ]
> db.countries.distinct('country').length
4
As noted in the MongoDB documentation:
Results must not be larger than the maximum BSON size (16MB). If your results exceed the maximum BSON size, use the aggregation pipeline to retrieve distinct values using the $group operator, as described in Retrieve Distinct Values with the Aggregation Pipeline.
Here is example of using aggregation API. To complicate the case we're grouping by case-insensitive words from array property of the document.
db.articles.aggregate([
{
$match: {
keywords: { $not: {$size: 0} }
}
},
{ $unwind: "$keywords" },
{
$group: {
_id: {$toLower: '$keywords'},
count: { $sum: 1 }
}
},
{
$match: {
count: { $gte: 2 }
}
},
{ $sort : { count : -1} },
{ $limit : 100 }
]);
that give result such as
{ "_id" : "inflammation", "count" : 765 }
{ "_id" : "obesity", "count" : 641 }
{ "_id" : "epidemiology", "count" : 617 }
{ "_id" : "cancer", "count" : 604 }
{ "_id" : "breast cancer", "count" : 596 }
{ "_id" : "apoptosis", "count" : 570 }
{ "_id" : "children", "count" : 487 }
{ "_id" : "depression", "count" : 474 }
{ "_id" : "hiv", "count" : 468 }
{ "_id" : "prognosis", "count" : 428 }
With MongoDb 3.4.4 and newer, you can leverage the use of $arrayToObject operator and a $replaceRoot pipeline to get the counts.
For example, suppose you have a collection of users with different roles and you would like to calculate the distinct counts of the roles. You would need to run the following aggregate pipeline:
db.users.aggregate([
{ "$group": {
"_id": { "$toLower": "$role" },
"count": { "$sum": 1 }
} },
{ "$group": {
"_id": null,
"counts": {
"$push": { "k": "$_id", "v": "$count" }
}
} },
{ "$replaceRoot": {
"newRoot": { "$arrayToObject": "$counts" }
} }
])
Example Output
{
"user" : 67,
"superuser" : 5,
"admin" : 4,
"moderator" : 12
}
I wanted a more concise answer and I came up with the following using the documentation at aggregates and group
db.countries.aggregate([{"$group": {"_id": "$country", "count":{"$sum": 1}}}])
You can leverage on Mongo Shell Extensions. It's a single .js import that you can append to your $HOME/.mongorc.js, or programmatically, if you're coding in Node.js/io.js too.
Sample
For each distinct value of field counts the occurrences in documents optionally filtered by query
> db.users.distinctAndCount('name', {name: /^a/i})
{
"Abagail": 1,
"Abbey": 3,
"Abbie": 1,
...
}
The field parameter could be an array of fields
> db.users.distinctAndCount(['name','job'], {name: /^a/i})
{
"Austin,Educator" : 1,
"Aurelia,Educator" : 1,
"Augustine,Carpenter" : 1,
...
}
To find distinct in field_1 in collection but we want some WHERE condition too than we can do like following :
db.your_collection_name.distinct('field_1', {WHERE condition here and it should return a document})
So, find number distinct names from a collection where age > 25 will be like :
db.your_collection_name.distinct('names', {'age': {"$gt": 25}})
Hope it helps!
I use this query:
var collection = "countries"; var field = "country";
db[collection].distinct(field).forEach(function(value){print(field + ", " + value + ": " + db[collection].count({[field]: value}))})
Output:
countries, England: 3536
countries, France: 238
countries, Australia: 1044
countries, Spain: 16
This query first distinct all the values, and then count for each one of them the number of occurrences.
If you're on MongoDB 3.4+, you can use $count in an aggregation pipeline:
db.users.aggregate([
{ $group: { _id: '$country' } },
{ $count: 'countOfUniqueCountries' }
]);
Mongo isn't expiring old collections. I checked to make sure my index is type date.
var keys = IndexKeys.Ascending("expiry");
var options = IndexOptions.SetTimeToLive(TimeSpan.FromMinutes(1));
collection.EnsureIndex(keys, options);
this.ExpireDate = new BsonDateTime(DateTime.UtcNow.AddMinutes(5));
var insertResult = collection.Insert(this);
Any tips would be gladly appreciated.
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "Showsv1.ShowInfo",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"expiry" : 1
},
"ns" : "Showsv1.ShowInfo",
"name" : "expiry_1",
"expireAfterSeconds" : 60
}
]
"expiry" : ISODate("2013-02-15T02:40:45.876Z")
The code was missing [BsonElement("expiry")] on top of the ExpireTime property.
Thanks #WiredPrairie for the tip.