I have a collection of documents that can contain criteria grouped into categories. The structure could look like this:
{
"Name": "MyDoc",
"Criteria" : [
{
"Category" : "Areas",
"Values" : ["Front", "Left"]
},
{
"Category" : "Severity",
"Values" : ["High"]
}
]
}
The class I'm using to create the embedded documents for the criteria looks like this:
public class CriteriaEntity
{
public string Category { get; set; }
public IEnumerable<string> Values { get; set; }
}
The user can choose criteria from each category to search (which comes into the function as IEnumerable<CriteriaEntity>) and the document must contain all the selected criteria in order to be returned. This was my first attempt:
var filterBuilder = Builders<T>.Filter;
var filters = new List<FilterDefinition<T>>();
filters.Add(filterBuilder.Exists(entity =>
userCriterias.All(userCriteria =>
entity.Criteria.Any(entityCriteria =>
entityCriteria.Category == userCriteria.Category
&& userCriteria.Values.All(userValue =>
entityCriteria.Values.Any(entityValue =>
entityValue == userValue))))));
However I get the error: "Unable to determine the serialization information for entity...". How can I get this to work?
MongoDB.Driver 2.0 doesn't support Linq.All. Anyway you task can be resolve next way:
var filterDefinitions = new List<FilterDefinition<DocumentEntity>>();
foreach (var criteria in searchCriterias)
{
filterDefinitions
.AddRange(criteria.Values
.Select(value => new ExpressionFilterDefinition<DocumentEntity>(doc => doc.Criterias
.Any(x => x.Category == criteria.Category && x.Values.Contains(value)))));
}
var filter = Builders<DocumentEntity>.Filter.And(filterDefinitions);
return await GetCollection<DocumentEntity>().Find(filter).ToListAsync();
Related
I have a collection like this (I removed fields that are not related to the question)
{
_id:ObjectId('5dd7d946cd9c645f1cdc21ef'),
Versions: [
{
"Barcode" : "200830001128132700636"
},
{
"Barcode" : "200830001128132700637"
}
]
},
{
_id:ObjectId('5dd7d946cd9c645f1cdc21eg'),
Versions: [
{
"Barcode" : "200830001128132700638"
},
{
"Barcode" : "200830001128132700639"
}
]
}
I need to find the greatest (max) barcode in the whole collection.
I tried with a code like this:
var options = new FindOptions<Document>
{
Limit = 1,
Sort = Builders<Document>.Sort.Descending(d => d.Versions.Select(v => v.BarCode).Aggregate((v1, v2) => string.Compare(v1, v2) > 0 ? v1 : v2))
};
using var results = await _context.DocumentiItems.FindAsync(FilterDefinition<Document>.Empty, options);
But I get ArgumentNullException, I think it's unable to traslate the expression with the aggregate.
Can you suggest me a better approach?, if possible I want to avoid the use of BSON strings and use only labmda expressions.
The type of is DocumentiItems is IMongoCollection<Document>
this can be easily achieved with the AsQueryable() interface like so:
var result = collection.AsQueryable()
.SelectMany(i => i.Versions)
.OrderByDescending(v => v.Barcode)
.Take(1)
.Single();
here's a test program:
using MongoDB.Entities;
using MongoDB.Entities.Core;
using System;
using System.Linq;
namespace StackOverflow
{
public class Item : Entity
{
public Version[] Versions { get; set; }
}
public class Version
{
public string Barcode { get; set; }
}
public class Program
{
private static void Main(string[] args)
{
new DB("test", "localhost");
var result = DB.Queryable<Item>()
.SelectMany(i => i.Versions)
.OrderByDescending(v => v.Barcode)
.Take(1)
.Single();
Console.WriteLine($"max barcode: {result.Barcode}");
Console.Read();
}
}
}
Say I have the following class structures
public class EmailActivity {
public IEnumerable<MemberActivity> Activity { get; set; }
public string EmailAddress { get; set; }
}
public class MemberActivity {
public EmailAction? Action { get; set; }
public string Type { get; set; }
}
public enum EmailAction {
None = 0,
Open = 1,
Click = 2,
Bounce = 3
}
I wish to filter a list of EmailActivity objects based on the presence of a MemberActivity with a non-null EmailAction matching a provided list of EmailAction matches. I want to return just the EmailAddress property as a List<string>.
This is as far as I've got
List<EmailAction> activityTypes; // [ EmailAction.Open, EmailAction.Bounce ]
List<string> activityEmailAddresses =
emailActivity.Where(
member => member.Activity.Where(
activity => activityTypes.Contains(activity.Action)
)
)
.Select(member => member.EmailAddress)
.ToList();
However I get an error message "CS1503 Argument 1: cannot convert from 'EmailAction?' to 'EmailAction'"
If then modify activityTypes to allow null values List<EmailAction?> I get the following "CS1662 Cannot convert lambda expression to intended delegate type because some of the return types in the block are not implicitly convertible to the delegate return type".
The issue is the nested .Where it's returning a list, but the parent .Where requires a bool result. How would I tackle this problem?
I realise I could do with with nested loops however I'm trying to brush up my C# skills!
Using List.Contains is not ideal in terms of performance, HashSet is a better option, also if you want to select the email address as soon as it contains one of the searched actions, you can use Any:
var activityTypes = new HashSet<EmailAction>() { EmailAction.Open, EmailAction.Bounce };
List<string> activityEmailAddresses =
emailActivity.Where(
member => member.Activity.Any(
activity => activity.Action.HasValue &&
activityTypes.Contains(activity.Action.Value)
)
)
.Select(activity => activity.EmailAddress)
.ToList();
You want to use All or Any depends if you want each or at least one match...
HashSet<EmailAction> activityTypes = new HashSet<EmailAction> { EmailAction.None };
var emailActivity = new List<EmailActivity>
{
new EmailActivity { Activity = new List<MemberActivity>{ new MemberActivity { Action = EmailAction.None } }, EmailAddress = "a" },
new EmailActivity { Activity = new List<MemberActivity>{ new MemberActivity { Action = EmailAction.Click } }, EmailAddress = "b" }
};
// Example with Any but All can be used as well
var activityEmailAddresses = emailActivity
.Where(x => x.Activity.Any(_ => _.Action.HasValue && activityTypes.Contains(_.Action.Value)))
.Select(x => x.EmailAddress)
.ToArray();
// Result is [ "a" ]
I have the following RavenDB Index:
public class RidesByPostcode : AbstractIndexCreationTask<Ride, RidesByPostcode.IndexEntry>
{
public class IndexEntry
{
public string PostcodeFrom { get; set; }
public string PostcodeTo { get; set; }
}
public RidesByPostcode()
{
Map = rides => from doc in rides
select new
{
doc.DistanceResult.PostcodeFrom,
doc.DistanceResult.PostcodeTo
};
StoreAllFields(FieldStorage.Yes);
}
}
I also have a list of strings representing postcodes, and I want to get all the Rides for which the PostcodeFrom is in the list of postcodes:
var postcodes = new List<string> { "postcode 1", "postcode 2" };
var rides = _database.Query<RidesByPostcode.IndexEntry, RidesByPostcode>()
.Where(x => postcodes.Contains(x.PostcodeFrom))
.OfType<Ride>()
.ToList();
But of course RavenDb says it cannot understand the .Contains expression.
How can I achieve such a query in RavenDb without having to call .ToList() before the where clause?
Ok, I found the answer: RavenDb's .In() extension method (see the "Where + In" section of the docs).
Apparently I was thinking from the outside in, instead of from the inside out :)
This is the final query:
var rides = _database.Query<RidesByPostcode.IndexEntry, RidesByPostcode>()
.Where(x => !x.IsAccepted && x.PostcodeFrom.In(postcodes))
.OfType<Ride>()
.ToList();
For a proof of concept I have loaded ~54 million records into mongodb. The goal is to investigate the query speed of mongodb.
I use the following class to store the data:
[BsonDiscriminator("Part", Required = true)]
public class Part
{
[BsonId]
public ObjectId Id { get; set; }
[BsonElement("pgc")]
public int PartGroupCode { get; set; }
[BsonElement("sc")]
public int SupplierCode { get; set; }
[BsonElement("ref")]
public string ReferenceNumber { get; set; }
[BsonElement("oem"), BsonIgnoreIfNull]
public List<OemReference> OemReferences { get; set; }
[BsonElement("alt"), BsonIgnoreIfNull]
public List<AltReference> AltReferences { get; set; }
[BsonElement("crs"), BsonIgnoreIfNull]
public List<CrossReference> CrossReferences { get; set; }
[BsonElement("old"), BsonIgnoreIfNull]
public List<FormerReference> FormerReferences { get; set; }
[BsonElement("sub"), BsonIgnoreIfNull]
public List<SubPartReference> SubPartReferences { get; set; }
}
And I created the following indexes:
Compound Index on ref, sc, pgc
Ascending Index on oem.refoem
Ascending Index on alt.refalt
Ascending Index on crs.refcrs
Ascending Index on old.refold
Ascending Index on sub.refsub
I perform the following queries to test the performance:
var searchValue = "345";
var start = DateTime.Now;
var result1 = collection.AsQueryable<Part>().OfType<Part>().Where(part => part.ReferenceNumber == searchValue);
long count = result1.Count();
var finish = DateTime.Now;
start = DateTime.Now;
var result2 = collection.AsQueryable<Part>().OfType<Part>().Where(part =>
part.ReferenceNumber.Equals(searchValue) ||
part.OemReferences.Any(oem => oem.ReferenceNumber.Equals(searchValue)) ||
part.AltReferences.Any(alt => alt.ReferenceNumber.Equals(searchValue)) ||
part.CrossReferences.Any(crs => crs.ReferenceNumber.Equals(searchValue)) ||
part.FormerReferences.Any(old => old.ReferenceNumber.Equals(searchValue))
);
count = result2.Count();
finish = DateTime.Now;
start = DateTime.Now;
var result3 = collection.AsQueryable<Part>().OfType<Part>().Where(part =>
part.ReferenceNumber.StartsWith(searchValue) ||
part.OemReferences.Any(oem => oem.ReferenceNumber.StartsWith(searchValue)) ||
part.AltReferences.Any(alt => alt.ReferenceNumber.StartsWith(searchValue)) ||
part.CrossReferences.Any(crs => crs.ReferenceNumber.StartsWith(searchValue)) ||
part.FormerReferences.Any(old => old.ReferenceNumber.StartsWith(searchValue))
);
count = result3.Count();
finish = DateTime.Now;
var regex = new Regex("^345"); //StartsWith regex
start = DateTime.Now;
var result4 = collection.AsQueryable<Part>().OfType<Part>().Where(part =>
regex.IsMatch(part.ReferenceNumber) ||
part.OemReferences.Any(oem => regex.IsMatch(oem.ReferenceNumber)) ||
part.AltReferences.Any(alt => regex.IsMatch(alt.ReferenceNumber)) ||
part.CrossReferences.Any(crs => regex.IsMatch(crs.ReferenceNumber)) ||
part.FormerReferences.Any(old => regex.IsMatch(old.ReferenceNumber))
);
count = result4.Count();
finish = DateTime.Now;
The results are not what I would have expected:
Search 1 on 345 results in: 3 records (00:00:00.3635937)
Search 2 on 345 results in: 58 records (00:00:00.0671566)
Search 3 on 345 results in: 6189 records (00:01:17.6638459)
Search 4 on 345 results in: 6189 records (00:01:17.0727802)
Why is the StartsWith query (3 and 4) so much slower?
The StartsWith query performance is the make or break decision.
Did I create the wrong indexes? Any help is appreciated.
Using mongodb with the 10gen C# driver
UPDATE:
The way the query is translated from Linq to a MongoDB query is very important for the performance. I build the same query (like 3 and 4) again but with the Query object:
var query5 = Query.And(
Query.EQ("_t", "Part"),
Query.Or(
Query.Matches("ref", "^345"),
Query.Matches("oem.refoem", "^345"),
Query.Matches("alt.refalt", "^345"),
Query.Matches("crs.refcrs", "^345"),
Query.Matches("old.refold", "^345")));
start = DateTime.Now;
var result5 = collection.FindAs<Part>(query5);
count = result5.Count();
finish = DateTime.Now;
The result of this query is returned in 00:00:00.4522972
The query translated as
command: { count: "PSG", query: { _t: "Part", $or: [ { ref: /^345/ }, { oem.refoem: /^345/ }, { alt.refalt: /^345/ }, { crs.refcrs: /^345/ }, { old.refold: /^345/ } ] } }
Compared with Query 3 and 4 the difference is big:
command: { count: "PSG", query: { _t: "Part", $or: [ { ref: /^345/ }, { oem: { $elemMatch: { refoem: /^345/ } } }, { alt: { $elemMatch: { refalt: /^345/ } } }, { crs: { $elemMatch: { refcrs: /^345/ } } }, { old: { $elemMatch: { refold: /^345/ } } } ] } }
So why is query 3 and 4 not using the indexes?
From the index documentation:
Every query, including update operations, uses one and only one index.
In other words, MongoDB doesn't support index intersection. Thus, creating a huge number of indexes is pointless unless there are queries that use this index and this index only. Also, make sure you're calling the correct Count() method here. If you call the linq-to-object extensions (IEnumerable's Count() extension rather than MongoCursor's Count, it will actually have to fetch and hydrate all objects).
It is probably easier to throw these in a single mutli-key index like this:
{
"References" : [ { id: new ObjectId("..."), "_t" : "OemReference", ... },
{ id: new ObjectId("..."), "_t" : "CrossReferences", ...} ],
...
}
where References.id is indexed. Now, a query db.foo.find({"References.id" : new ObjectId("...")}) will automatically search for any match in the array of references. Since I assume the different types of references must be distinguished, it makes sense to use a discriminator so the driver can support polymorphic deserialization. In C#, you'd declare this like
[BsonDiscriminator(Required=true)]
[BsonKnownTypes(typeof(OemReference), typeof(...), ...)]
class Reference { ... }
class OemReference : Reference { ... }
The driver will automatically serialize the type name in a field called _t. That behaviour can be adjusted to your needs, if required.
Also note that shortening the property names will decrease storage requirements, but won't affect index size.
(The title for this question isn't the best, but I'm unsure how else to word it!)
I'm working on a search form which contains a checklist of values. Basically, a checked item means 'include this type in the search'. Something like this:
Search for item: __________
Search in:
[ ] Fresh Foods
[ ] Frozen Foods
[ ] Beverages
[ ] Deli Counter
I have an object to represent this search:
class FoodSearchCriteria{
public string SearchString {get;set;}
public bool SearchFreshFoods {get;set;}
public bool SearchFrozenFoods {get;set;}
public bool SearchBeverages {get;set;}
public bool SearchDeliCounter {get;set;}
}
The only way I can think of doing this atm is like this:
public IList<FoodItem> FindFoodItems(FoodSearchCriteria criteria)
// in reality, this is a fuzzy search not an exact match
var matches = _DB.FoodItems.Where(x => x.FoodTitle == SearchString);
var inCategories = new List<FoodItem>();
if (criteria.SearchFreshFoods)
inCategories.Add(matches.Where(x => x.Type == 'Fresh Foods'));
if (criteria.SearchFrozenFoods)
inCategories.Add(matches.Where(x => x.Type == 'Frozen Foods'));
//etc etc
return inCategories;
}
This feels like a code smell to me, what would be a better way to approach it?
Take a look at PredicateBuilder
PredicateBuilder predicate = PredicateBuilder.False<FoodItem>();
if (criteria.SearchFreshFoods)
{
predicate = predicate.Or(x => x.Type == 'Fresh Foods');
}
if (criteria.SearchFrozenFoods)
{
predicate = predicate.Or(x => x.Type == 'Frozen Foods'));
}
...
_DB.FoodItems.Where(predicate);
Have you tried:
List<string> types = new List<string>();
if (criteria.SearchFreshFoods) { types.Add("Fresh Foods"); }
if (criteria.SearchFrozenFoods) { types.Add("Frozen Foods"); }
if (criteria.SearchBeverages) { types.Add("Beverages"); }
if (criteria.SearchDeliCounter) { types.Add("Deli Counter"); }
return _DB.FoodItems.Where(x => x.FoodTitle == SearchString &&
types.Contains(x.Type));
That means just one SQL query, which is handy.
You could certainly refactor the FoodSearchCriteria type to make it easier to build the list though...
I have no time to review but this could be an untested solution.
class SearchItem
{
string Name {get; set;}
bool IsSelected {get; set;}
}
class FoodSearchCriteria
{
String searchText {get; set;}
IList<SearchItem> SearchItems{ get; }
}
public IList<FoodItem> FindFoodItems(FoodSearchCriteria criteria)
// in reality, this is a fuzzy search not an exact match
var matches = _DB.FoodItems.Where(x => x.FoodTitle == criteria.SearchText &&
criteria.SearchItems.Where(si => si.IsSelected).Contains(i => i.Name == x.Type));
return mathces;
}