Getting general information about MongoDB collections with FSharp - c#

Can I retrieve basic information about all collections in a MongoDB with F#?
I have a MongoDB with > 450 collections. I can access the db with
open MongoDB.Bson
open MongoDB.Driver
open MongoDB.Driver.Core
open MongoDB.FSharp
open System.Collections.Generic
let connectionString = "mystring"
let client = new MongoClient(connectionString)
let db = client.GetDatabase(name = "Production")
I had considered trying to just get all collections then loop through each collection name and get basic information about each collection with
let collections = db.ListCollections()
and
db.GetCollection([name of a collection])
but the db.GetCollection([name]) requires me to define a type to pull the information about each collection. This is challenging for me as I don't want to have to define a type for each collection, of which there are > 450, and frankly, I don't really know much about this DB. (Actually, no one in my org does; that's why I'm trying to put together a very basic data dictionary.)
Is defining the type for each collection really necessary? Can I use the MongoCollection methods available here without having to define a type for each collection?
EDIT: Ultimately, I'd like to be able to output collection name, the n documents in each collection, a list of the field names in each collection, and a list of each field type.

I chose to write my examples in C# as i'm more familiar with the C# driver and it is a listed tag on the question. You can run an aggregation against each collection to find all top level fields and their (mongodb) types for each document.
The aggregation is done in 3 steps. Lets assume the input is 10 documents which all have this form:
{
"_id": ObjectId("myId"),
"num": 1,
"str": "Hello, world!"
}
$project Convert each document into an array of documents with values fieldName and fieldType. Outputs 10 documents with a single array field. The array field will have 3 elements.
$unwind the arrays of field infos. Outputs 30 documents each with a single field corresponding to an element from the output of step 1.
$group the fields by fieldName and fieldType to get distinct values. Outputs 3 documents. Since all fields with the same name always have the same type in this example, there is only one final output document for each field. If two different documents defined the same field, one as string and one as int there would be separate entries in this result set for both.
// Define our aggregation steps.
// Step 1, $project:
var project = new BsonDocument
{ {
"$project", new BsonDocument
{
{
"_id", 0
},
{
"fields", new BsonDocument
{ {
"$map", new BsonDocument
{
{ "input", new BsonDocument { { "$objectToArray", "$$ROOT" } } },
{ "in", new BsonDocument {
{ "fieldName", "$$this.k" },
{ "fieldType", new BsonDocument { { "$type", "$$this.v" } } }
} }
}
} }
}
}
} };
// Step 2, $unwind
var unwind = new BsonDocument
{ {
"$unwind", "$fields"
} };
// Step 3, $group
var group = new BsonDocument
{
{
"$group", new BsonDocument
{
{
"_id", new BsonDocument
{
{ "fieldName", "$fields.fieldName" },
{ "fieldType", "$fields.fieldType" }
}
}
}
}
};
// Connect to our database
var client = new MongoClient("myConnectionString");
var db = client.GetDatabase("myDatabase");
var collections = db.ListCollections().ToEnumerable();
/*
We will store the results in a dictionary of collections.
Since the same field can have multiple types associated with it the inner value corresponding to each field is `List<string>`.
The outer dictionary keys are collection names. The inner dictionary keys are field names.
The inner dictionary values are the types for the provided inner dictionary's key (field name).
List<string> fieldTypes = allCollectionFieldTypes[collectionName][fieldName]
*/
Dictionary<string, Dictionary<string, List<string>>> allCollectionFieldTypes = new Dictionary<string, Dictionary<string, List<string>>>();
foreach (var collInfo in collections)
{
var collName = collInfo["name"].AsString;
var coll = db.GetCollection<BsonDocument>(collName);
Console.WriteLine("Finding field information for " + collName);
var pipeline = PipelineDefinition<BsonDocument, BsonDocument>.Create(project, unwind, group);
var cursor = coll.Aggregate(pipeline);
var lst = cursor.ToList();
allCollectionFieldTypes.Add(collName, new Dictionary<string, List<string>>());
foreach (var item in lst)
{
var innerDict = allCollectionFieldTypes[collName];
var fieldName = item["_id"]["fieldName"].AsString;
var fieldType = item["_id"]["fieldType"].AsString;
if (!innerDict.ContainsKey(fieldName))
{
innerDict.Add(fieldName, new List<string>());
}
innerDict[fieldName].Add(fieldType);
}
}
Now you can iterate over your result set:
foreach(var collKvp in allCollectionFieldTypes)
{
foreach(var fieldKvp in collKvp.Value)
{
foreach(var fieldType in fieldKvp.Value)
{
Console.WriteLine($"Collection {collKvp.Key} has field name {fieldKvp.Key} with type {fieldType}");
}
}
}

Related

How to easily merge two anonymous objects with different data structure?

I would like to merge these two anonymous objects:
var man1 = new {
name = new {
first = "viet"
},
age = 20
};
var man2 = new {
name = new {
last = "vo"
},
address = "123 street"
};
Into a single one:
var man = new {
name = new {
first = "viet",
last = "vo"
},
age = 20,
address = "123 street"
};
I looked for a solution but found nothing clever.
Convert the anonymous object to ExpandoObject which is essentially a dictionary of string key and object value:
var man1Expando = man1.ToDynamic();
var man2Expando = man2.ToDynamic();
public static ExpandoObject ToDynamic(this object obj)
{
IDictionary<string, object> expando = new ExpandoObject();
foreach (var propertyInfo in obj.GetType().GetProperties())
{
var currentValue = propertyInfo.GetValue(obj);
if (propertyInfo.PropertyType.IsAnonymous())
{
expando.Add(propertyInfo.Name, currentValue.ToDynamic());
}
else
{
expando.Add(propertyInfo.Name, currentValue);
}
}
return expando as ExpandoObject;
}
I'm using a helper extension to establish whether a type is an anonymous one:
public static bool IsAnonymous(this Type type)
{
return type.DeclaringType is null
&& type.IsGenericType
&& type.IsSealed
&& type.IsClass
&& type.Name.Contains("Anonymous");
}
Then, merge two resulting expando objects into one, but recursively, checking for nested expando objects:
var result = MergeDictionaries(man1Expando, man2Expando, overwriteTarget: true);
public static IDictionary<string, object> MergeDictionaries(
IDictionary<string, object> targetDictionary,
IDictionary<string, object> sourceDictionary,
bool overwriteTarget)
{
foreach (var pair in sourceDictionary)
{
if (!targetDictionary.ContainsKey(pair.Key))
{
targetDictionary.Add(pair.Key, sourceDictionary[pair.Key]);
}
else
{
if (targetDictionary[pair.Key] is IDictionary<string, object> innerTargetDictionary)
{
if (pair.Value is IDictionary<string, object> innerSourceDictionary)
{
targetDictionary[pair.Key] = MergeDictionaries(
innerTargetDictionary,
innerSourceDictionary,
overwriteTarget);
}
else
{
// What to do when target propety is nested, but source is not?
// Who takes precedence? Target nested property or source value?
if (overwriteTarget)
{
// Replace target dictionary with source value.
targetDictionary[pair.Key] = pair.Value;
}
}
}
else
{
if (pair.Value is IDictionary<string, object> innerSourceDictionary)
{
// What to do when target propety is not nested, but source is?
// Who takes precedence? Target value or source nested value?
if (overwriteTarget)
{
// Replace target value with source dictionary.
targetDictionary[pair.Key] = innerSourceDictionary;
}
}
else
{
// Both target and source are not nested.
// Who takes precedence? Target value or source value?
if (overwriteTarget)
{
// Replace target value with source value.
targetDictionary[pair.Key] = pair.Value;
}
}
}
}
}
return targetDictionary;
}
The overwriteTarget parameter decides which object takes priority when merging.
Usage code:
var man1 = new
{
name = new
{
first = "viet",
},
age = 20,
};
var man2 = new
{
name = new
{
last = "vo",
},
address = "123 street",
};
var man1Expando = man1.ToDynamic();
var man2Expando = man2.ToDynamic();
dynamic result = MergeDictionaries(man1Expando, man2Expando, overwriteTarget: true);
Console.WriteLine(JsonConvert.SerializeObject(result, Formatting.Indented));
and the result:
{
"name": {
"first": "viet",
"last": "vo"
},
"age": 20,
"address": "123 street"
}
Notice how I assigned the result to dynamic. Leaving compiler assign the type will leave you with expando object presented as IDictionary<string, object>. With a dictionary representation, you cannot access properties in the same manner as if it was an anonymous object:
var result = MergeDictionaries(man1Expando, man2Expando, overwriteTarget: true);
result.name; // ERROR
That's why the dynamic. With dynamic you are losing compile time checking, but have two anonymous objects merged into one. You have to judge for yourself if it suits you.
There's nothing built-in in the C# language to support your use case. Thus, the question in your title needs to be answered with "Sorry, there is no easy way".
I can offer the following alternatives:
Do it manually:
var man = new {
name = new {
first = man1.name.first,
last = man2.name.first
},
age = man1.age,
address = man2.address
};
Use a class instead of an anonymous type for the resulting type (let's call it CompleteMan). Then, you can
create a new instance var man = new CompleteMan(); ,
use reflection to collect the properties and values from your "partial men" (man1 and man2),
assign those values to the properties of your man.
It's "easy" in the sense that the implementation will be fairly straight-forward, but it will still be a lot of code, and you need to take your nested types (name) into account.
If you desperately want to avoid non-anonymous types, you could probably use an empty anonymous target object, but creating this object (var man = new { name = new { first = (string)null, last = (string)null, ...) is not really less work than creating a class in the first place.
Use a dedicated dynamic data structure instead of anonymous C# classes:
The Newtonsoft JSON library supports merging of JSON objects.
Dictionaries can also be merged easily.
ExpandoObjects can be merged easily as well.

Update documents in array via Mongo's $addToSet in C#

I am trying to upsert documents within an array of documents via the C# Driver for MongoDB. I manage to modify existing array elements via $set & arrayFilters, but struggle to add non-existing elements via $addToSet.
I would be glad about any suggestion, even if there is a completely different way.
My simplified class in C#
internal class TimeSeries
{
[BsonId]
internal Name;
[BsonDictionaryOptions(DictionaryRepresentation.ArrayOfDocuments)]
internal Dictionary<DateTime, double> Container;
}
// add a test document
void foo()
{
var coll = _myDatabase.GetCollection<TimeSeries>("myColl");
var res = coll.InsertOne(new TimeSeries()
{ Name = "abc", Container = new Dictionary<DateTime, double>()
{{new DateTime(2000,1,1),20}}
});
}
$set and $addToSet in the Mongo Shell work fine:
// modify the existing value to 30
db.myColl.update( {"_id":"abc"}, {$set: {"Container.$[loc].v":30}}, {arrayFilters:[{"loc.k":new Date("2000-01-01")}]})
// add if no existent
db.myColl.update( {"_id":"abc"}, {$addToSet: {"Container": {"k":new Date("2000-02-01"),"v":200}}})
In C# I can reproduce $set, but get a "Specific Cast is not valid" error for $addToSet.
var filter = Builders<TimeSeries>.Filter.Eq("_id", "abc");
var arrayFilters = new List<ArrayFilterDefinition<BsonDocument>>()
{new BsonDocument("loc.k", new DateTime(2000,1,1))};
// $set
var upsert = Builders<TimeSeries>.Update.Set("Container.$[loc].v", 30);
var resUpt = coll.UpdateOne(filter, upsert, new UpdateOptions { ArrayFilters = arrayFilters })
// $addToSet
var upsert_add = Builders<TimeSeries>.Update.AddToSet("Container", new BsonDocument { { "k", new DateTime(2000, 2, 1) }, { "v", 50} });
var res_add = coll.UpdateOne(filter, upsert_add); // Specific Cast is not valid

How do you filter updates to specific fields from ChangeStream in MongoDB

I am setting up a ChangeStream to notify me when a document has changed in a collection so that I can upsert the "LastModified" element for that document to the time of the event. Since this update will cause a new event to occur on the ChangeStream, I need to filter out these updates to prevent an infinite loop (updating the LastModified element because the LastModified element was just updated...).
I have the following code that is working when I specify the exact field:
ChangeStreamOptions options = new ChangeStreamOptions();
options.ResumeAfter = resumeToken;
string filter = "{ $and: [ { operationType: { $in: ['replace','insert','update'] } }, { 'updateDescription.updatedFields.LastModified': { $exists: false } } ] }";
var pipeline = new EmptyPipelineDefinition<ChangeStreamDocument<BsonDocument>>().Match(filter);
var cursor = collection.Watch(pipeline, options, cancelToken);
However, instead of hard-coding the "updateDescription.updatedFields.LastModified", I would like to provide a list of element names that I don't want to exist in the updatedFields document.
I attempted:
string filter = "{ $and: [ { operationType: { $in: ['replace','insert','update'] } }, { 'updateDescription.updatedFields': { $nin: [ 'LastModified' ] } } ] }";
but this didn't work as expected (I still got the update events for the LastModified change.
I originally was using the Filter Builder:
FilterDefinitionBuilder<ChangeStreamDocument<BsonDocument>> filterBuilder = Builders<ChangeStreamDocument<BsonDocument>>.Filter;
FilterDefinition<ChangeStreamDocument<BsonDocument>> filter = filterBuilder.In("operationType", new string[] { "replace", "insert", "update" }); //Only include the change if it was one of these types. Available types are: insert, update, replace, delete, invalidate
filter &= filterBuilder.Nin("updateDescription.updatedFields", ChangedFieldsToIgnore); //If this is an update, only include it if the field(s) updated contains 1+ fields not in the ChangedFieldsToIgnore list
where ChangedFieldsToIgnore is a List containing the field names that I do not want to get events for.
Can anyone help with the syntax that I need to use? or do I need to create a loop around my ChangedFieldsToIgnore list and create a new entry in the filter for each item to "$exists: false"? (this doesn't seem very efficient).
EDIT:
I attempted the following code based on the answer by #wan-bachtiar, but I'm getting an exception on my enumerator.MoveNext() call:
var match1 = new BsonDocument { { "$match", new BsonDocument { { "operationType", new BsonDocument { { "$in", new BsonArray(new string[] { "replace", "insert", "update" }) } } } } } };
var match2 = new BsonDocument { { "$addFields", new BsonDocument { { "tmpfields", new BsonDocument { { "$objectToArray", "$updateDescription.updatedFields" } } } } } };
var match3 = new BsonDocument { { "$match", new BsonDocument { { "tmpfields.k", new BsonDocument { { "$nin", new BsonArray(updatedFieldsToIgnore) } } } } } };
var pipeline = new[] { match1, match2, match3 };
var cursor = collection.Watch<ChangeStreamDocument<BsonDocument>>(pipeline, options, Profile.CancellationToken);
enumerator = cursor.ToEnumerable().GetEnumerator();
enumerator.MoveNext();
ChangeStreamDocument<BsonDocument> doc = enumerator.Current;
The exception is: "{"Invalid field name: \"tmpfields\"."}"
I suspect the problem might be that I'm getting "replace" and "insert" events which do not contain the updateDescription field, so the $addFields/$objectToArray are failing. I'm too new to figure out the syntax, but I think I need to use a filter that does:
{ $match: { "operationType": { $in: ["replace", "insert"] } } }
OR
{ $eq: { "operationTYpe": "update" }} AND { $addFields....}
Also, it appears that the C# driver does not include a Builder that helps with the $addFields and $objectToArray operations. I was only able to use the new BsonDocument {...} method to build the pipeline variable.
ChangedFieldsToIgnore is a List containing the field names that I do not want to get events for.
If you would like to filter based on multiple keys (whether updatedFields contains certain fields), it's easier if you convert the keys to values first.
You can convert the document contained within updatedFields into values by utilising aggregation operator $objectToArray. For example:
pipeline = [{"$addFields": {
"tmpfields":{
"$objectToArray":"$updateDescription.updatedFields"}
}},
{"$match":{"tmpfields.k":{
"$nin":["LastModified", "AnotherUnwantedField"]}}}
];
The above aggregation pipeline adds a temporary field called tmpfields. This new field will pivot the content of updateDescription.updatedFields turning {name:value} into [{k:name, v:value}]. Once we have those keys as values, we can utilise $nin as an array of filter.
UPDATED
The reason you're getting an exception of tmpfields being invalid, is because the result is casted into ChangeStreamDocument model which does not have a recognizable field called tmpfields.
In the case, when it's different operations that does not have field updateDescription.updatedFields, the value of tmpfields would just be null.
Below is an example of MongoDB ChangeStream .Net/C# using MongoDB .Net driver v2.5, along with an aggregation pipeline that modifies the output change stream.
This example is not type safe, and would return BsonDocument :
var database = client.GetDatabase("database");
var collection = database.GetCollection<BsonDocument>("collection");
var options = new ChangeStreamOptions { FullDocument = ChangeStreamFullDocumentOption.UpdateLookup };
// Aggregation Pipeline
var addFields = new BsonDocument {
{ "$addFields", new BsonDocument {
{ "tmpfields", new BsonDocument {
{ "$objectToArray",
"$updateDescription.updatedFields" }
} }
} } };
var match = new BsonDocument {
{ "$match", new BsonDocument {
{ "tmpfields.k", new BsonDocument {
{ "$nin", new BsonArray{"LastModified", "Unwanted"} }
} } } } };
var pipeline = new[] { addFields, match };
// ChangeStreams
var cursor = collection.Watch<BsonDocument>(pipeline, options);
foreach (var change in cursor.ToEnumerable())
{
Console.WriteLine(change.ToJson());
}
I wrote the piece of code below as I was having the same issues you were having. No need to mess around with BsonObjects ...
//The operationType can be one of the following: insert, update, replace, delete, invalidate
//ignore the field lastrun as we would end in an endles loop
var pipeline = new EmptyPipelineDefinition<ChangeStreamDocument<ATask>>()
.Match("{ operationType: { $in: [ 'replace', 'update' ] } }")
.Match(#"{ ""updateDescription.updatedFields.LastRun"" : { $exists: false } }")
.Match(#"{ ""updateDescription.updatedFields.IsRunning"" : { $exists: false } }");
var options = new ChangeStreamOptions { FullDocument = ChangeStreamFullDocumentOption.UpdateLookup };
var changeStream = Collection.Watch(pipeline, options);
while (changeStream.MoveNext())
{
var next = changeStream.Current;
foreach (var obj in next)
yield return obj.FullDocument;
}

How to access the first object within a nested BsonArray within a BsonDocument

As the title states I'm attempting to access the first object within a nested BsonArray. I'm new to mongodb so still trying wrap my head around it.
Anyways, lets say I have the following data:
{
"_id" : ObjectId("111111dasd1asdasd1asd1"),
"data" : [
{
"name":"John Smith",
"push ups":20
},
{
"name":"John Smith",
"push ups":22
},
{
"name":"John Smith",
"push ups":25
}
]
}
I'm attempting to create a new BsonDocument by taking _id and name where I give the query a parameter for _id but I just grab whatever is in name without giving it a parameter.
ie.
{
"connect":ObjectId("111111dasd1asdasd1asd1", //assigning the value of id from the original document to this field
"name":"John Smith"
}
Since my mongo collection is structured so that each Bsondocument only deals with one unique name, I do not want to loop through the data BsonArray of my original document example. I just want to access whatever value name has and move on.
Here is my current code attempting to create a new object from values of id and name from the BsonDocument
(NOTE: This is within a method where an id param is provided).
var query = Query.EQ("_id", id);
var tempRecord = existing.FindOne(query);
var record = new
{
name = tempRecord["data"]["name"],
connect = id
};
result = record.ToBsonDocument();
return result;
At the moment tempRecord is correctly storing the data returned from the query I'm passing it. However how would I properly access the name field of the data array within tempRecord?
If you want to retrieve each name of the same _id, here's the code, if you want to change it a little bit, just leave a comment :
first of all,you should change your _id value,it's invalid, put for example : ObjectId("56d18c7e5e4b3c416e7b408b")
database => test | collection => myCollection
MongoClient client = new MongoClient();
var db = client.GetDatabase("test");
var collection = db.GetCollection<BsonDocument>("myCollection");
var filter1 = Builders<BsonDocument>.Filter.Eq("_id", new ObjectId("56d18c7e5e4b3c416e7b408b"));
using (var cursor = await collection.FindAsync(filter1))
{
while (await cursor.MoveNextAsync())
{
var batch = cursor.Current;
foreach (var document in batch)
{
int subDocMax = document[1].AsBsonArray.Count;
for(int i=0;i<subDocMax;i++)
{
MessageBox.Show("Name :"+document[1][i][0].ToString());
}
}
}
}

C# Linq, object definition does not contains a property

I need your help
I just wrote the following code
var anynomousObject = new { Amount = 10, weight = 20 };
List<object> ListOfAnynomous = new List<object> { anynomousObject };
var productQuery =
from prod in ListOfAnynomous
select new { prod.Amount, prod.weight }; // here it object on 'prod.Amount, prod.weight' that the object defenetion does not contains the "Amount" and "weight" properties
foreach (var v in productQuery)
{
Console.WriteLine(v.Amount, v.weight);
}
so please could you help me to solve this problem.
You need to make a class of your object definition, or using the dynamic keywork instead of boxing in object :
var anynomousObject = new { Amount = 10, weight = 20 };
List<dynamic> ListOfAnynomous = new List<dynamic> { anynomousObject };
var productQuery =
from prod in ListOfAnynomous
select new { prod.Amount, prod.weight };
foreach (var v in productQuery)
{
Console.WriteLine(v.Amount, v.weight);
}
this is because, when you box as object, the compiler doesn't know the definition of your anonymous var. Dynamic make it evaluate at runtime instead of compile-time.
The other option is to create a class or struct.
Your List<object> has a list of objects. The Linq query looks this list, and all it sees are regular objects.
Either use a class or a structure to store your objects, or use List<dynamic>

Categories