MongoDB Text Search with projection - c#

Using MongoDB with C# and driver 2.0, I am trying to do the following:
Text search
Sort the hits by text search score
Project BigClass to SmallClass
Here is a (simplified version of) the classes:
class BigClass
{
[BsonIgnoreIfDefault]
public ObjectId _id { get; set; }
public string Guid { get; set; }
public string Title { get; set; }
public DateTime CreationTime { get; set; }
// lots of other stuff
[BsonIgnoreIfNull]
public double? TextMatchScore { get; set; } // Temporary place for the text match score, for sorting
}
class SmallClass
{
[BsonIgnoreIfDefault]
public ObjectId _id { get; set; }
public string Title { get; set; }
[BsonIgnoreIfNull]
public double? TextMatchScore { get; set; } // Temporary place for the text match score, for sorting
}
If I do a text search, it is pretty straightforward:
var F = Builders<BigClass>.Filter.Text("text I am looking for");
var Result = MongoDriver.Find(F).ToListAsync().Result;
If I want to sort by the score of the text search, it's a bit more messy (and very POORLY documented):
var F = Builders<BigClass>.Filter.Text("text I am looking for");
var P = Builders<BigClass>.Projection.MetaTextScore("TextMatchScore");
var S = Builders<BigClass>.Sort.MetaTextScore("TextMatchScore");
var Result = MongoDriver.Find(F).Project<BigClass>.Sort(S).ToListAsync().Result;
Essentially it requires me to add a field in the class (TextMatchScore) to hold the result.
If I want to get the data, without sorting and project it to SmallClass, it is straightforward:
var F = Builders<BigClass>.Filter.Text("text I am looking for");
var P = Builders<BigClass>.Projection.Include(_ => _.id).Include(_ => _.Title);
var Result = MongoDriver.Find(F).Project<SmallClass>(P).ToListAsync().Result;
Now if "I want it all", that's where problem arises:
var F = Builders<BigClass>.Filter.Text("text I am looking for");
var P = Builders<BigClass>.Projection.MetaTextScore("TextMatchScore").Include(_ => _.id).Include(_ => _.Title).Include(_ => _.TextMatchScore);
var S = Builders<BigClass>.Sort.MetaTextScore("TextMatchScore");
var Result = MongoDriver.Find(F).Project<SmallClass>.Sort(S).ToListAsync().Result;
I get an exception:
Message = "QueryFailure flag was true (response was { \"$err\" : \"Can't canonicalize query: BadValue must have $meta projection for all $meta sort keys\", \"code\" : 17287 })."
As expected, the error is not documented anywhere as the Mongo guys expect users to self-document everything.
If I make the projection to 'BigClass', there is no problem, the code runs and just fills in the right fields.
If you google that text with C#, the posts you find are mine when I was trying to figure out the text search, which is also poorly documented.
So when we combine projection, text search and sorting, there doesn't seem to be any example anywhere and I just can't get it to work.
Does anyone know the reason for that problem?

This works for me:
var client = new MongoClient();
var db = client.GetDatabase("test");
var col = db.GetCollection<BigClass>("big");
await db.DropCollectionAsync(col.CollectionNamespace.CollectionName);
await col.Indexes.CreateOneAsync(Builders<BigClass>.IndexKeys.Text(x => x.Title));
await col.InsertManyAsync(new[]
{
new BigClass { Title = "One Jumped Over The Moon" },
new BigClass { Title = "Two went Jumping Over The Sun" }
});
var filter = Builders<BigClass>.Filter.Text("Jump Over");
// don't need to Include(x => x.TextMatchScore) because it's already been included with MetaTextScore.
var projection = Builders<BigClass>.Projection.MetaTextScore("TextMatchScore").Include(x => x._id).Include(x => x.Title);
var sort = Builders<BigClass>.Sort.MetaTextScore("TextMatchScore");
var result = await col.Find(filter).Project<SmallClass>(projection).Sort(sort).ToListAsync();
I removed the include of the TextMatchScore. It still comes back, because it was included by the MetaTextScore("TextMatchScore").
Documentation is a work in progress. We tackle the major use cases first as those hit the most people. This use case isn't that common and hasn't been documented. We certainly accept pull requests, both for code and documentation. Also, feel free to file a documentation ticket at jira.mongodb.org under the CSHARP project.

Solution which works in MongoDB.Driver 2.x is as follows. What is important is to not do Include in Projection, as it will erase default one, (or remember to add proper projection)
Query:
{
"find":"SoceCollection",
"filter":{
"$text":{
"$search":"some text to search"
}
},
"sort":{
"TextScore":{
"$meta":"textScore"
}
},
"projection":{
"TextScore":{
"$meta":"textScore"
},
"_id":0,
"CreatedDate":0
},
"limit":20,
"collation":{
"locale":"en",
"strength":1
} ...
CODE
var sort = Builders<BigModel>.Sort.MetaTextScore(nameof(LightModel.TextScore));
var projection = Builders<BigModel>.Projection
.MetaTextScore(nameof(LightModel.TextScore))
.Exclude(x => x.Id)
.Exclude(x => x.CreatedDate);
return await Collection()
.Find(filter, new FindOptions { Collation = new Collation("en", strength: CollationStrength.Primary) })
.Project<LightModel>(projection)
.Sort(sort)
.Limit(20)
.ToListAsync();

Related

Get a unique list (by key) from a list-in-list

I have a list of requests. Each request has many approvers. I want to go through all the requests and their approvers and get a list of unique approvers and their requests.
Here are sample models:
var requestsToProcess = await GetBatchOfApprovedRequestsAsyn(); // new List<RequestModel>();
public class RequestModel
{
public RequestModel()
{
ApproversList = new List<RequestApproverModel>();
}
public long Id { get; set; } // Key
public string Brief { get; set; }
public string Description { get; set; }
public List<RequestApproverModel> ApproversList { get; set; }
}
public class RequestApproverModel
{
public string Email { get; set; } // Key
public string FullName { get; set; }
}
I know how to get unique tuple from a list but don't understand if the target list is on an element of another list.
Basically the premise, is flatten and project, then groupby, then optionally project again.
Given
var requests= new List<RequestModel>()
{
new()
{
Id = 1,
ApproversList = new List<RequestApproverModel>()
{
new(){Email = "bob"},
new(){Email = "dole"}
}
},
new()
{
Id = 2,
ApproversList = new List<RequestApproverModel>()
{
new(){Email = "bob"},
new(){Email = "blerg"}
}
}
};
Example
var results =requests.SelectMany(request =>
request.ApproversList,
(request, approver) => new {request, approver})
.GroupBy(x => x.approver.Email )
.Select(x => new { Approver = x.Key, Requests = x.Select(y => y.request).ToList() });
foreach (var item in results)
{
Console.WriteLine(item.Approver);
foreach (var request in item.Requests)
Console.WriteLine(" " + request.Id);
}
Output
bob
1
2
dole
1
blerg
2
The two complementary methods you need from LINQ are SelectMany, which unpacks a list-of-lists to a list, and GroupBy, which packs a list to a list-of-lists (you need to go from a-of-b to b-of-a)
var result = someRequestModels
.SelectMany(rm => rm.ApproversList, (rm, ram) => new { RM = rm, RamEmail = ram.Email })
.GroupBy(at => at.RamEmail, at => at.RM);
The SelectMany is like a nested pair of foreach
foreach(var rm in someRequestModels)
foreach(var ram in rm.ApproversList)
flatlist.Add(new { rm, ram});
This has turned the list of lists into a single list, repeating the RequestModel over and over per RequestApproverModel. You can then run a GroupBy of approver Email which takes every unique email in the flattened list and puts together a list of list of RequestModels. In non LINQ pseudocode it'd look something like:
foreach(var rmRamPair In flatlist)
grouping[rmRamPair.Email].Add(rmRamPair.Rm);
This produces an IGrouping which is something like a list of lists, where each entry has a Key, a string of the approver's email and is an enumerable of all the requestmodels they have, so eg
foreach(var x in result){
Console.WriteLine($"approver with email of {x.Key} has cases:";
foreach(var rm in x)
Console.WriteLine($"id is '{rm.Id}' and Brief is '{rm.Brief}'");
}
If it makes you more comfortable, you can call ToDictionary(x => x.Key, x => x.ToList()) on the result and you'll get a Dictionary<string, List<RequestModel>> out, the email being the key and and list of requestmodels being the value
If you want the whole RequestApproverModel, not just the email it might be a bit more tricky. It's easy if you've reused instances of RAM so if there is literally only one object in memory that is "bob#mail.com" and that object is present on a couple of different requests:
var ram = new RequestApproverModel{ Email = "bob#mail.com" };
var r1 = new RequestModel();
r1.ApproversList.Add(ram);
var r2 = new RequestModel();
r2.ApproversList.Add(ram);
Here the instance is the same one; you can just group by it instead of the email.
If you've ended up with objects that look the same but are at different memory addresses:
var r1 = new RequestModel();
r1.ApproversList.Add(new RequestApproverModel{ Email = "bob#mail.com" });
var r2 = new RequestModel();
r2.ApproversList.Add(new RequestApproverModel{ Email = "bob#mail.com" });
Then the standard implementation of Equals and GetHashcode(inherited from object) is useless because it's based on the memory stress where the instances live.
Your RequestModel class will instead need to implement Equals and GetHashcode that report equality based on Email, otherwise grouping by the whole RequestModel won't work out

Filtering nested lists with nullable property

Say I have the following class structures
public class EmailActivity {
public IEnumerable<MemberActivity> Activity { get; set; }
public string EmailAddress { get; set; }
}
public class MemberActivity {
public EmailAction? Action { get; set; }
public string Type { get; set; }
}
public enum EmailAction {
None = 0,
Open = 1,
Click = 2,
Bounce = 3
}
I wish to filter a list of EmailActivity objects based on the presence of a MemberActivity with a non-null EmailAction matching a provided list of EmailAction matches. I want to return just the EmailAddress property as a List<string>.
This is as far as I've got
List<EmailAction> activityTypes; // [ EmailAction.Open, EmailAction.Bounce ]
List<string> activityEmailAddresses =
emailActivity.Where(
member => member.Activity.Where(
activity => activityTypes.Contains(activity.Action)
)
)
.Select(member => member.EmailAddress)
.ToList();
However I get an error message "CS1503 Argument 1: cannot convert from 'EmailAction?' to 'EmailAction'"
If then modify activityTypes to allow null values List<EmailAction?> I get the following "CS1662 Cannot convert lambda expression to intended delegate type because some of the return types in the block are not implicitly convertible to the delegate return type".
The issue is the nested .Where it's returning a list, but the parent .Where requires a bool result. How would I tackle this problem?
I realise I could do with with nested loops however I'm trying to brush up my C# skills!
Using List.Contains is not ideal in terms of performance, HashSet is a better option, also if you want to select the email address as soon as it contains one of the searched actions, you can use Any:
var activityTypes = new HashSet<EmailAction>() { EmailAction.Open, EmailAction.Bounce };
List<string> activityEmailAddresses =
emailActivity.Where(
member => member.Activity.Any(
activity => activity.Action.HasValue &&
activityTypes.Contains(activity.Action.Value)
)
)
.Select(activity => activity.EmailAddress)
.ToList();
You want to use All or Any depends if you want each or at least one match...
HashSet<EmailAction> activityTypes = new HashSet<EmailAction> { EmailAction.None };
var emailActivity = new List<EmailActivity>
{
new EmailActivity { Activity = new List<MemberActivity>{ new MemberActivity { Action = EmailAction.None } }, EmailAddress = "a" },
new EmailActivity { Activity = new List<MemberActivity>{ new MemberActivity { Action = EmailAction.Click } }, EmailAddress = "b" }
};
// Example with Any but All can be used as well
var activityEmailAddresses = emailActivity
.Where(x => x.Activity.Any(_ => _.Action.HasValue && activityTypes.Contains(_.Action.Value)))
.Select(x => x.EmailAddress)
.ToArray();
// Result is [ "a" ]

Replace in MongoDB With C#

I am trying to actually replace a collection of Objects of type Game in my Collection "Games".
I want to replace these Objects with entirely new Objects. I have researched a bit on MongoDB and I see that 'UpdateMany' will replace Fields with new values but that's not exactly what I want. I wish to replace the entire Object.
For reference, this is my Game class:
public class Game
{
public Guid Id { get; set; }
public string Title { get; set; }
public string Developer { get; set; }
public int ProjectId { get; set; }
public Game()
{
this.Id = Guid.NewGuid();
}
}
This is my method I am using to attempt a bulk Replace. I am passing in a ProjectId, so for all of the Game Objects that have a ProjectId = to the argument, replace the Object with a new Game Object.
public static void ReplaceGame(int ProjectId, IMongoDatabase Database)
{
IMongoCollection<Game> gameCollection = Database.GetCollection<Game>("Game");
List<Game> gameCollectionBeforeReplacement = gameCollection.Find(g => true).ToList();
if (gameCollectionBeforeReplacement.Count == 0)
{
Console.WriteLine("No Games in Collection...");
return;
}
var filter = Builders<Game>.Filter.Eq(g => g.ProjectId, ProjectId);
foreach (Game game in gameCollection.AsQueryable())
gameCollection.ReplaceOneASync(filter, new Game() { Title = "REPLACEMENT TITLE" });
}
Not only does this take an excessive amount of time. I suspect it's because of the .AsQueryable() call but it also doesn't work. I am wondering how I can actually replace all instances picked up by my filter with new Game Objects.
Consider the following code:
public virtual ReplaceOneResult ReplaceOne(TDocument replacement, int projId)
{
var filter = Builders<TDocument>.Filter.Eq(x => x.ProjectId, projId);
var result = Collection.ReplaceOne(filter, replacement, new UpdateOptions() { IsUpsert = false }, _cancellationToken);
return result;
}
You will find that ReplaceOneResult has a property that tells you the matched count. This makes it possible for you to keep executing the ReplaceOne call until the matched count equals 0. When this happens, you know all documents in your collection that had the corresponding project id have been replaced.
Example:
var result = ReplaceOne(new Game() { Title = "REPLACEMENT TITLE" }, 12);
while (result.MatchedCount > 0)
result = ReplaceOne(new Game() { Title = "REPLACEMENT TITLE" }, 12);
This makes it so that you don't need the call to the database before you start replacing.
However, if you wish to insert the same values for every existing game, I would suggest you to do an UpdateMany operation. There you can use $set to specify all required values. The code above is simply not performant, with going to the database for every single replace call.

C# Traverse Hierarchy without Recursion

I need a hand to transform my recursive function into a loop as I'm stuck trying to do this for hours. The reason is that I kept running into StackOverflow exception.
Please check the following code:
private List<int> GetManagers(Employee employee, List<Employee> employeeList)
{
List<int> collection = new List<int>();
if (employee.DirectManagers.Any())
{
var managers = employeeList.Where(x => employee.DirectManagers.Any(y => y.Equals(x.Id)));
foreach (var manager in managers)
{
if (!collection.Any(x => x.Equals(manager.Id)))
collection.Add(manager.Id);
if (manager.DirectManagers.Any())
collection.AddRange(GetManagers(manager, employeeList));
}
}
return collection;
}
Edit: More codes here
foreach (var employee in employeeList)
{
List<int> allManagers = new List<int>();
allManagers = GetManagers(employee, employeeList);
// Do something with allManagers found here that does not affect the collection
}
public class Employee
{
public int Id { get; set; }
public int? DepartmentId { get; set; }
public List<int> DirectManagers { get; set; }
public List<int> DirectSubordinates { get; set; }
public int Counter { get; set; }
}
var employeeList = context.AdministratorProfiles
.Where(x => !x.dateResigned.HasValue && x.departmentID.HasValue)
.Select(x => new Employee {
Id = x.id,
DepartmentId = x.departmentID,
Counter = 0,
DirectManagers = x.Managers.Select(y => y.managerID).ToList(),
DirectSubordinates = x.Subordinates.Select(y => y.adminID).ToList()
}).ToList(); // TODO: Add active account here
Basically, what this does is that I'm trying to get all the managers of an Employee. Due to the huge number of staff, I often run into StackOverflow exception. I need a hand, appreciate if anyone out there could lend a hand. Thank you.
Edit: Now, I have listed all the codes. So perhaps you can have a better understanding. Basically, what I'm trying to do is to loop through every single employee to perform work, first I must have a work list. This work list would exclude all the managers or managers' managers to form the final list.
Your problem is not recursion but cyclic references. You can use pattern visitor to work with this problem. (In this pattern you mark all entities that were visited with your recursion method and if you visit this entity again, you just return)
You could do something like:
...
Dictionary<int, bool> processed = new Dictionary<int, bool>();
Queue<Employee> managersQueue = new Queue<Employee>();
managersQueue.Enqueue(employee);
while (managersQueue.Any())
{
var currentEmployee = stack.Dequeue();
var managers = employeeList.Where(x => currentEmployee.DirectManagers.Any(y => y.Equals(x.Id)));
foreach (var manager in managers)
{
if (processed.ContainsKey(manager.Id)) continue;
processed.Add(manager.Id, true);
managersQueue.Enqueue(manager);
}
}
return processed.Select(x => x.Key).ToList();
This is just a basic outline of how you could do this iteratively, obviously I don't know your code base or exactly how certain calls would be made.

Linq using StartsWith always empty

I have a simple List with dummy data as follows:
List<Organisation> list = new List<Organisation>();
list.Add(new Organisation() { LogoUrl = "/images/logos/Blade.png", OrganisationId = 1, OrganisationName = "Blade" });
list.Add(new Organisation() { LogoUrl = "/images/logos/Torn.png", OrganisationId = 2, OrganisationName = "Torn" });
When I run the linq query:
var results = from org in OrganisationsController.GetDummyList()
where org.OrganisationName.StartsWith(searchString)
select org;
It always returns an Empty result. In this case the searchString is specified by the user and the example would be "Tor".
Using different variations like 'where org.OrganisationName == searchString' where the search string is Torn works. But StartsWith never works.
Any ideas where I'm going wrong?
EDIT:
From Jon's code I changed my code to look as follows:
public JsonResult Search(string searchString)
{
//create json result object
JsonResult data = new JsonResult();
var list = OrganisationsController.GetDummyList();
//query the list
var results = from org in list
where org.OrganisationName.ToLower().Contains(searchString.ToLower())
select org;
if (results.Any())
{
System.Diagnostics.Debug.Write("found");
}
//setup the data
data.Data = results;
//return the data
return Json(data, JsonRequestBehavior.AllowGet);
}
Note: I changed the StartsWith to Contains, but both are giving me similary problems.
One of my organisations is called 'Absa'. Here's the really strange thing when I fire up the app for the first time putting in 'bsa' returns nothing, I then enter 'Absa' and it returns a good result. Then I entered 'bsa' again just to double check and it returned Absa which it didn't in the first test. Why would the result not work at first then work later?
Thanks,
Jacques
Unable to reproduce. It works fine for me:
using System;
using System.Collections.Generic;
using System.Linq;
class Organisation
{
public string LogoUrl { get; set; }
// Removed redundant Organisation prefixes
public int Id { get; set; }
public string Name { get; set; }
}
class Test
{
static void Main()
{
// Used collection initializer for sanity
var list = new List<Organisation>
{
new Organisation { LogoUrl = "Blade.png", Id = 1, Name = "Blade" },
new Organisation { LogoUrl = "Torn.png", Id = 2, Name = "Torn" },
};
string searchString = "Tor";
var query = from org in list
where org.Name.StartsWith(searchString)
select org;
// Nicer version:
// var query = list.Where(org => org.Name.StartsWith(searchString));
Console.WriteLine(query.Count()); // 1
}
}
Work out the difference between your code and my code to find out what's wrong.
In particular, you've shown code using List<T>, which means LINQ to Objects. If your real code uses LINQ to SQL or Entity Framework, that could easily affect things.

Categories