I have a MongoDB collection like this:
{
_id: "abc",
history:
[
{
status: 1,
reason: "confirmed"
},
{
status: 2,
reason: "accepted"
}
],
_id: "xyz",
history:
[
{
status: 2,
reason: "accepted"
},
{
status: 10,
reason: "cancelled"
}
]
}
I want to write a query in C# to return the documents whose last history item is 2 (accepted). So in my result I should not see "xyz" because its state has changed from 2, but I should see "abc" since its last status is 2. The problem is that getting the last item is not easy with MongoDB's C# driver - or I don't know how to.
I tried the linq's lastOrDefault but got System.InvalidOperationException: {document}{History}.LastOrDefault().Status is not supported error.
I know there is a workaround to get the documents first (load to memory) and then filter, but it is client side and slow (consumes lot of network). I want to do the filter on server.
Option 1) Find() -> expected to be faster
db.collection.find({
$expr: {
$eq: [
{
$arrayElemAt: [
"$history.status",
-1
]
},
2
]
}
})
Playground1
Option 2) Aggregation
db.collection.aggregate([
{
"$addFields": {
last: {
$arrayElemAt: [
"$history",
-1
]
}
}
},
{
$match: {
"last.status": 2
}
},
{
$project: {
"history": 1
}
}
])
Playground2
I found a hackaround: to override the history array with the last history document, then apply the filter as if there was no array. This is possible through Aggregate operation $addFields.
PipelineDefinition<Process, BsonDocument> pipeline = new BsonDocument[]
{
new BsonDocument("$addFields",
new BsonDocument("history",
new BsonDocument ( "$slice",
new BsonArray { "$history", -1 }
)
)
),
new BsonDocument("$match",
new BsonDocument
{
{ "history.status", 2 }
}
)
};
var result = collection.Aggregate(pipeline).ToList();
result will be the documents with last history of 2.
I have the following document structure:
{
"Agencies": [
{
"name": "tcs",
"id": "1",
"AgencyUser": [
{
"UserName": "ABC",
"Code": "ABC40",
"Link": "http.ios.com",
"TotalDownloads": 0
},
{
"UserName": "xyz",
"Code": "xyz20",
"Link": "http.ios.com",
"TotalDownloads": 0
}
]
}
]
}
Like I have multiple agencies and each agency contains a list of agents.
What I am trying is to pass the Code and update the TotalDownloads field of the agent that matches the code.
For example, if someone uses the code ABC40 so I need to update the field TotalDownloads of the agent called "ABC".
What I have tried is as below:
public virtual async Task UpdateAgentUsersDownloadByCode(string Code)
{
var col = _db.GetCollection<Agencies>(Agencies.DocumentName);
FilterDefinition<Agencies> filter = Builders<Agencies>.Filter.Eq("AgencyUsers.Code", Code);
UpdateDefinition<Agencies> update = Builders<Agencies>.Update.Inc(x => x.AgencyUsers.FirstOrDefault().TotalDownloads, 1);
await col.UpdateOneAsync(filter, update);
}
It is giving me the following error:
Unable to determine the serialization information for x => x.AgencyUsers.FirstOrDefault().TotalDownloads.
Where I'm wrong?
Note: From the attached sample document, the array property name: AgencyUser is not matched with the property name that you specified in the update operation, AgencyUsers.
Use arrayFilters with $[<identifier>] positional filtered operator to update the element(s) in the array.
MongoDB syntax
db.Agencies.update({
"AgencyUsers.Code": "ABC40"
},
{
$inc: {
"AgencyUsers.$[agencyUser].TotalDownloads": 1
}
},
{
arrayFilters: [
{
"agencyUser.Code": "ABC40"
}
]
})
Demo # Mongo Playground
MongoDB .NET Driver syntax
UpdateDefinition<Agencies> update = Builders<Agencies>.Update
.Inc("AgencyUsers.$[agencyUser].TotalDownloads", 1);
UpdateOptions updateOptions = new UpdateOptions
{
ArrayFilters = new[]
{
new BsonDocumentArrayFilterDefinition<Agencies>(
new BsonDocument("agencyUser.Code", Code)
)
}
};
UpdateResult result = await col.UpdateOneAsync(filter, update, updateOptions);
Demo
I am pretty new to mongo DB and experimenting with it for one of our applications. We are trying to implement CQRS and query part we are trying to use node.js and command part we are implementing through c#.
One of my collections might have millions of documents in it. We would have a scenarioId field and each scenario can have around two million records.
Our use case is to compare these two scenarios data and do some mathematical operation on the each field of scenarios.
For example, each scenario can have a property avgMiles and I would like to compute the difference of this property and users should be able to filter on this difference value. As my design is to keep both scenarios data in single collection i am trying to do group by scenario id and further project it.
My sample structure of a document would look like below.
{
"_id" : ObjectId("5ac05dc58ff6cd3054d5654c"),
"origin" : {
"code" : "0000",
},
"destination" : {
"code" : "0001",
},
"currentOutput" : {
"avgMiles" : 0.15093020854848138,
},
"scenarioId" : NumberInt(0),
"serviceType" : "ECON"
}
When I group I would group it based on origin.code and destination.code and serviceType properties.
My aggregate pipeline query looks like this:
db.servicestats.aggregate([{$match:{$or:[{scenarioId:0}, {scenarioId:1}]}},
{$sort:{'origin.code':1,'destination.code':1,serviceType:1}},
{$group:{
_id:{originCode:'$origin.code',destinationCode:'$destination.code',serviceType:'$serviceType'},
baseScenarioId:{$sum:{$switch: {
branches: [
{
case: { $eq: [ '$scenarioId', 1] },
then: '$scenarioId'
}],
default: 0
}
}},
compareScenarioId:{$sum:{$switch: {
branches: [
{
case: { $eq: [ '$scenarioId', 0] },
then: '$scenarioId'
}],
default: 0
}
}},
baseavgMiles:{$max:{$switch: {
branches: [
{
case: { $eq: [ '$scenarioId', 1] },
then: '$currentOutput.avgMiles'
}],
default: null
}
}},
compareavgMiles:{$sum:{$switch: {
branches: [
{
case: { $eq: [ '$scenarioId', 0] },
then: '$currentOutput.avgMiles'
}],
default: null
}
}}
}
},
{$project:{scenarioId:
{ base:'$baseScenarioId',
compare:'$compareScenarioId'
},
avgMiles:{base:'$baseavgMiles', comapre:'$compareavgMiles',diff:{$subtract :['$baseavgMiles','$compareavgMiles']}}
}
},
{$match:{'avgMiles.diff':{$eq:0.5}}},
{$limit:100}
],{allowDiskUse: true} )
My group pipeline stage would have 4 million documents going in it. Can you please suggest how I can improve the performance of this query?
I have an index on the fields used in my group by condition and I have added a sort pipeline stage to help group by to perform better.
Any suggestions are most welcome.
As group by is not workin in my case i have implemented left outer join using $lookup and the query would look like below.
db.servicestats.aggregate([
{$match:{$and :[ {'scenarioId':0}
//,{'origin.code':'0000'},{'destination.code':'0001'}
]}},
//{$limit:1000000},
{$lookup: { from:'servicestats',
let: {ocode:'$origin.code',dcode:'$destination.code',stype:'$serviceType'},
pipeline:[
{$match: {
$expr: { $and:
[
{ $eq: [ "$scenarioId", 1 ] },
{ $eq: [ "$origin.code", "$$ocode" ] },
{ $eq: [ "$destination.code", "$$dcode" ] },
{ $eq: [ "$serviceType", "$$stype" ] },
]
}
}
},
{$project: {_id:0, comp :{compavgmiles :'$currentOutput.avgMiles'}}},
{ $replaceRoot: { newRoot: "$comp" } }
],
as : "compoutputs"
}},
{
$replaceRoot: {
newRoot: {
$mergeObjects:[
{
$arrayElemAt: [
"$$ROOT.compoutputs",
0
]
},
{
origin: "$$ROOT.origin",
destination: "$$ROOT.destination",
serviceType: "$$ROOT.serviceType",
baseavgmiles: "$$ROOT.currentOutput.avgMiles",
output: '$$ROOT'
}
]
}
}
},
{$limit:100}
])
the above query performance is good and returns in 70 ms.
But in my scenario i need a full outer join to be implemented which i understood mongo does not support as of now and implemented using $facet pipeline as below
db.servicestats.aggregate([
{$limit:1000},
{$facet: {output1:[
{$match:{$and :[ {'scenarioId':0}
]}},
{$lookup: { from:'servicestats',
let: {ocode:'$origin.code',dcode:'$destination.code',stype:'$serviceType'},
pipeline:[
{$match: {
$expr: { $and:
[
{ $eq: [ "$scenarioId", 1 ] },
{ $eq: [ "$origin.code", "$$ocode" ] },
{ $eq: [ "$destination.code", "$$dcode" ] },
{ $eq: [ "$serviceType", "$$stype" ] },
]
}
}
},
{$project: {_id:0, comp :{compavgmiles :'$currentOutput.avgMiles'}}},
{ $replaceRoot: { newRoot: "$comp" } }
],
as : "compoutputs"
}},
//{
// $replaceRoot: {
// newRoot: {
// $mergeObjects:[
// {
// $arrayElemAt: [
// "$$ROOT.compoutputs",
// 0
// ]
// },
// {
// origin: "$$ROOT.origin",
// destination: "$$ROOT.destination",
// serviceType: "$$ROOT.serviceType",
// baseavgmiles: "$$ROOT.currentOutput.avgMiles",
// output: '$$ROOT'
// }
// ]
// }
// }
// }
],
output2:[
{$match:{$and :[ {'scenarioId':1}
]}},
{$lookup: { from:'servicestats',
let: {ocode:'$origin.code',dcode:'$destination.code',stype:'$serviceType'},
pipeline:[
{$match: {
$expr: { $and:
[
{ $eq: [ "$scenarioId", 0 ] },
{ $eq: [ "$origin.code", "$$ocode" ] },
{ $eq: [ "$destination.code", "$$dcode" ] },
{ $eq: [ "$serviceType", "$$stype" ] },
]
}
}
},
{$project: {_id:0, comp :{compavgmiles :'$currentOutput.avgMiles'}}},
{ $replaceRoot: { newRoot: "$comp" } }
],
as : "compoutputs"
}},
//{
// $replaceRoot: {
// newRoot: {
// $mergeObjects:[
// {
// $arrayElemAt: [
// "$$ROOT.compoutputs",
// 0
// ]
// },
// {
// origin: "$$ROOT.origin",
// destination: "$$ROOT.destination",
// serviceType: "$$ROOT.serviceType",
// baseavgmiles: "$$ROOT.currentOutput.avgMiles",
// output: '$$ROOT'
// }
// ]
// }
// }
// },
{$match :{'compoutputs':{$eq:[]}}}
]
}
}
///{$limit:100}
])
But facet performance is very bad. Any further ideas to improve this are most welcome.
In general, there are three things that can cause slow queries:
The query is not indexed, cannot use indexes efficiently, or the schema design is not optimal (e.g. highly nested arrays or subdocuments) which means that MongoDB must do some extra work to arrive at the relevant data.
The query is waiting for something slow (e.g. fetching data from disk, writing data to disk).
Underprovisioned hardware.
In terms of your query, there may be some general suggestions regarding query performance:
Using allowDiskUse in an aggregation pipeline means that it is possible that the query will be using disk for some its stages. Disk is frequently the slowest part of a machine, so if it's possible for you to avoid this, it will speed up the query.
Note that an aggregation query is limited to 100MB memory use. This is irrespective of the amount of memory you have.
The $group stage cannot use indexes, because an index is tied to a document's location on disk. Once the aggregation pipeline enters a stage where the document's physical location is irrelevant (e.g. the $group stage), an index cannot be used anymore.
By default, the WiredTiger cache is ~50% of RAM, so a 64GB machine would have a ~32GB WiredTiger cache. If you find that the query is very slow, it is possible that MongoDB needed to go to disk to fetch the relevant documents. Monitoring iostats and checking disk utilization % during the query would provide hints toward whether enough RAM is provisioned.
Some possible solutions are:
Provision more RAM so that MongoDB doesn't have to go to disk very often.
Rework the schema design to avoid heavily nested fields, or multiple arrays in the document.
Tailor the document schema to make it easier for you to query the data in it, instead of tailoring the schema to how you think the data should be stored (e.g. avoid heavy normalization inherent in relational database design model).
If you find that you're hitting the performance limit of a single machine, consider sharding to horizontally scale the query. However, please note that sharding is a solution that would require careful design and consideration.
You are saying above that you'd like to group by scenarioId which, however, you don't. But that is probably what you should be doing to avoid all the switch statements. Something like this might get you going:
db.servicestats.aggregate([{
$match: {
scenarioId: { $in: [ 0, 1 ] }
}
}, {
$sort: { // not sure if that stage even helps - try to run with and without
'origin.code': 1,
'destination.code': 1,
serviceType: 1
}
}, {
$group: { // first group by scenarioId AND the other fields
_id: {
scenarioId: '$scenarioId',
originCode: '$origin.code',
destinationCode: '$destination.code',
serviceType: '$serviceType'
},
avgMiles: { $max: '$currentOutput.avgMiles' } // no switches needed
},
}, {
$group: { // group by the other fields only so without scenarioId
_id: {
originCode: '$_id.originCode',
destinationCode: '$_id.destinationCode',
serviceType: '$_id.serviceType'
},
baseScenarioAvgMiles: {
$max: {
$cond: {
if: { $eq: [ '$_id.scenarioId', 1 ] },
then: '$avgMiles',
else: 0
}
}
},
compareScenarioAvgMiles: {
$max: {
$cond: {
if: { $eq: [ '$_id.scenarioId', 0 ] },
then: '$avgMiles',
else: 0
}
}
}
},
}, {
$addFields: { // compute the difference
diff: {
$subtract :[ '$baseScenarioAvgMiles', '$compareScenarioAvgMiles']
}
}
}, {
$match: {
'avgMiles.diff': { $eq: 0.5 }
}
}, {
$limit:100
}], { allowDiskUse: true })
Beyond that I would suggest you use the power of db.collection.explain().aggregate(...) to find the right indexing and tune your query.
Suppose I have this simple object definition:
public class Item
{
public int section { get; set; }
public string item { get; set; }
}
I have some data in a single-depth array. This is JSON, which would be converted to C# objects via Json.NET:
[
{
"section": 0,
"item": "Hello!"
},
{
"section": 1,
"item": "First Steps"
},
{
"section": 1,
"item": "How to Ask for Help"
},
{
"section": 2,
"item": "Your First Program"
},
{
"section": 2,
"item": "The Code"
},
{
"section": 2,
"item": "How It Works"
},
{
"section": 3,
"item": "Where To Go From Here"
}
]
Using Entity Framework or some other method, I have arrived at a simple list of these objects as stated above, contained within a var variable.
Now what I want to do is get the same list, but where each section is grouped as an array within the outer array. For example, the JSON of what I want looks like this:
[
[
{
"section": 0,
"item": "Hello!"
}
],
[
{
"section": 1,
"item": "First Steps"
},
{
"section": 1,
"item": "How to Ask for Help"
}
],
[
{
"section": 2,
"item": "Your First Program"
},
{
"section": 2,
"item": "The Code"
},
{
"section": 2,
"item": "How It Works"
}
],
[
{
"section": 3,
"item": "Where To Go From Here"
}
]
]
My initial thought was to do something with a LINQ query using the groupby statement but I don't think this is what I'm looking for - groupby seems to be analogous to the SQL version so it can only be used for aggregate operations.
The only other option I have found so far is to use a LINQ query to get a list of all of the sections:
var allSections = (from x in myData select x.section).Distinct();
...and then iterate through those IDs and manually build the array:
List<List<Item>> mainList = new List<List<Item>>();
foreach (int thisSection in allSections.ToArray())
{
List<Item> thisSectionsItems = (from x in myData where x.section == thisSection select x).ToList();
mainList.Add(thisSectionsItems);
}
return mainList;
This should result in a proper enumerable that I can feed into JSON.NET and get the expected result, but this seems inefficient.
Is there a more LINQ-ish, or at least more efficient, way to split the items into groups?
You can certainly achieve this with .GroupBy()
var grouped = items
.GroupBy(x => x.section) // group by section
.Select(x => x.ToArray()) // build the inner arrays
.ToArray(); // return collection of arrays as an array
Assuming my 2 values are "Red Square" and "Green circle",
when i run the aggregation using Elastic search i get 4 values instead of 2, space separated?
They are Red, Square, Green, circle.
Is there a way to get the 2 original values.
The code is below:
var result = this.client.Search<MyClass>(s => s
.Size(int.MaxValue)
.Aggregations(a => a
.Terms("field1", t => t.Field(k => k.MyField))
)
);
var agBucket = (Bucket)result.Aggregations["field1"];
var myAgg = result.Aggs.Terms("field1");
IList<KeyItem> list = myAgg.Items;
foreach (KeyItem i in list)
{
string data = i.Key;
}
In your mapping, you need to set the field1 string as not_analyzed, like this:
{
"your_type": {
"properties": {
"field1": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
You can also make field1 a multi-field and make it both analyzed and not_analyzed to get the best of both worlds (i.e. text matching on the analyzed field + aggregation on the exact value of the not_analyzed raw sub-field).
{
"your_type": {
"properties": {
"field1": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
If you choose this second option, you'll need to run your aggregation on field1.raw instead of field1.