Indexing wish and trade lists in RavenDB - c#

I've tried many different strategies for indexing my data but can't seem to figure it out by myself.
I'm building a database over users and their games. The users can supply the database with games they own and would like to trade as well as a list of games they would like to have:
public class Member : EntityBase
{
public List<Game> TradeList { get; set; }
public List<Game> WishList { get; set; }
}
I'm trying to create and index I can query in the form of "Give me a list of all games (with corresponding members) which have games in their TradeList matching my WishList as well as having games in their WishList matching my TradeList".. and of course, myself excluded.
I tried creating a MultiMapIndex:
public class TradingIndex : AbstractMultiMapIndexCreationTask<TradingIndex.Result>
{
public enum ListType
{
Wishlist,
Tradelist
}
public class Result
{
public string Game { get; set; }
public string Member { get; set; }
public ListType List { get; set; }
}
public TradingIndex()
{
AddMap<Member>(members => from member in members
from game in member.TradeList
select new Result()
{
Game = game.Id,
Member = member.Id,
List = ListType.Tradelist
});
AddMap<Member>(members => from member in members
from game in member.WishList
select new Result()
{
Game = game.Id,
Member = member.Id,
List = ListType.Wishlist
});
}
}
And then querying it like this:
db.Query<TradingIndex.Result, TradingIndex>()
.Where(g =>
(g.Game.In(gamesIWant) && g.List == TradingIndex.ListType.Tradelist)
&&
(g.Game.In(gamesITrade) && g.List == TradingIndex.ListType.Wishlist)
&&
g.Member != me.Id
)
But I can't get that to work. I've also looked at Map/Reduce, but the problem I have seem to be getting RavenDB to give me the correct result type.
I hope you get what I'm trying to do and can give me some hints on what to look into.

First, you'll need to make sure that you store the fields you are indexing. This is required so you can get the index results back, instead of the documents that matched the index.
Add this to the bottom of your index definition:
StoreAllFields(FieldStorage.Yes);
Or if you want to be more verbose, (perhaps your index is doing other things also):
Store(x => x.Game, FieldStorage.Yes);
Store(x => x.Member, FieldStorage.Yes);
Store(x => x.List, FieldStorage.Yes);
When you query this, you'll need to tell Raven to send you back the index entries, by using ProjectFromIndexFieldsInto as described here.
Next, you need to realize that you aren't creating any single index entry that will match your query. The multimap index is creating separate entries in the index for each map. If you want to combine them in your results, you'll need to use an intersection query.
Putting this together, your query should look like this:
var q = session.Query<TradingIndex.Result, TradingIndex>()
.Where(g => g.Game.In(gamesIWant) &&
g.List == TradingIndex.ListType.Tradelist &&
g.Member != me.Id)
.Intersect()
.Where(g => g.Game.In(gamesITrade) &&
g.List == TradingIndex.ListType.Wishlist &&
g.Member != me.Id)
.ProjectFromIndexFieldsInto<TradingIndex.Result>();
Full test in this GIST.

Related

EF Core 5 check if all ids from filter exists in related entities

I have two models:
public class Employee
{
public int Id { get; set; }
public IList<Skill> { get; set; }
}
public class Skill
{
public int Id { get; set; }
}
And I have filter with list of skill ids, that employee should contain:
public class Filter
{
public IList<int> SkillIds { get; set; }
}
I want to write query to get all employees, that have all skills from filter.
I tried:
query.Where(e => filter.SkillIds.All(id => e.Skills.Any(skill => skill.Id == id)));
And:
query = query.Where(e => e.Skills
.Select(x => x.Id)
.Intersect(filter.SkillIds)
.Count() == filter.SkillIds.Count);
But as a result I get exception says that query could not be translated.
It is going to be a difficult, if not impossible task, to run a query like this on the sql server side.
This is because to make this work on the SQL side, you would be grouping each set of employee skills into a single row which would need to have a new column for every skill listed in the skills table.
SQL server wasn't really made to handle grouping with an unknown set of columns passed into a query. Although this kind of query is technically possible, it's probably not very easy to do through a model binding framework like ef core.
It would be easier to do this on the .net side using something like:
var employees = _context.Employees.Include(x=>x.Skill).ToList();
var filter = someFilter;
var result = employees.Where(emp => filter.All(skillID=> emp.skills.Any(skill=>skill.ID == skillID))).ToList()
This solution works:
foreach (int skillId in filter.SkillIds)
{
query = query.Where(e => e.Skills.Any(skill => skill.Id == skillId));
}
I am not sure about it's perfomance, but works pretty fast with small amount of data.
I've also encountered this issue several times now, this is the query I've come up with that I found works best and does not result in an exception.
query.Where(e => e.Skills.Where(s => filter.SkillIds.Contains(s.Id)).Count() == filter.SkillIds.Count);

C# filter based on request api

I'm trying to filter the posts that are displayed in my application. Everything goes like expected, but I have a small issue. A user can choose the education(s) and profession(s) he follows. How can I filter based on the arrays I get? I tried something like this, but it feels ugly. If I set more arrays in my Filter class like Language[].. It wil get more messy. Can I do something easier?
public class Filter
{
public string[] Education { get; set; }
public string[] Profession { get; set; }
public int PageIndex { get; set; }
}
Example request:
public PaginatedResults<Post> FilterPosts(Filter filter)
{
// Both Education and Profession arrays are empty, we just return all the posts
if(filter.Profession.Any(prof => prof == null) && filter.Education.Any(study => study == null)) {
var posts1 = _dbContext.Posts.AsEnumerable();
return _searchService.Pagination<Post>(posts1, filter.PageIndex);
}
else
{
// Can this be simplified? Sometimes the Education array is empty and sometimes Profession array. User can choose
IEnumerable<Post> posts = null;
if(filter.Profession.Any(prof => prof == null))
{
posts = _dbContext.Posts.Where(post => filter.Education.Contains(post.Education)).AsEnumerable();
}
else if(filter.Education.Any(study => study == null))
{
posts = _dbContext.Posts.Where(post => filter.Profession.Contains(post.Profession)).AsEnumerable();
}
else
{
posts = _dbContext.Posts.Where(post => filter.Profession.Contains(post.Profession) && filter.Education.Contains(post.Education)).AsEnumerable();
}
return _searchService.Pagination<Post>(posts, filter.PageIndex);
}
}
There probably quite a few ways you could approach this problem. Assuming you want to keep your approach (which I think is perfectly valid), you could try the following steps:
Leverage IQueryable
Assuming you use entity framework, I believe _dbContext.Posts implements IQueryable already. Since LINQ does not get executed immediately, we can build filtering conditions sequentially before enumerating the collection:
posts = _dbContext.Posts.Where(post => filter.Education.Contains(post.Education) && filter.Education.Contains(post.Profession)).AsEnumerable();
// since you are implementing `AND` semantics for your filters, is easy to break down into series of `.Where()` calls
posts = _dbContext.Posts.Where(post => filter.Education.Contains(post.Education))
.Where(post => filter.Education.Contains(post.Profession))
.AsEnumerable(); // this should filter Posts by Education AND Profession as well as represent the result as IEnumerable. Should be functionally identical to the first statement
Invert boolean conditions and check if filters have values
This will allow you to add a .Where filter only when it's needed:
if (filter.Profession.Any()) // if Profession has elements
{
posts = posts.Where(post => filter.Profession.Contains(post.Profession)); // apply respective filter to posts, you may want to ensure you only compare against meaningful search terms by appplying `.Where(i => !string.IsNullOrWhiteSpace(i))` to it
}
if (filter.Education.Any()) // if Education has elements
{
posts = posts.Where(post => filter.Education.Contains(post.Education)).AsEnumerable(); // apply respective filter to posts
}
Then, to put it all together
public PaginatedResults<Post> FilterPosts(Filter filter)
{
IQueryable<Post> posts = _dbContext.Posts;
if (filter.Profession.Any()) posts = posts.Where(post => filter.Profession.Contains(post.Profession));
if (filter.Education.Any()) posts = posts.Where(post => filter.Education.Contains(post.Education));
return _searchService.Pagination<Post>(posts.AsEnumerable(), filter.PageIndex);
}

Orchard CMS ContentQuery with N-to-N relation

In my project I've implemented N-to-N relation between records using this tutorial on OrchardProject web-site. I have 2 parts: MaterialPart & CategoryPart and association record.
Material part
public class MaterialPartRecord : ContentPartRecord {
public MaterialPartRecord() {
Categories = new List<ContentMaterialCategoryRecord>();
}
}
public class MaterialPart : ContentPart<MaterialPartRecord> {
public IEnumerable<CategoryPartRecord> Categories {
get { return Record.Categories.Select(cmcr => cmcr.CategoryPartRecord); }
}
}
CategoryPartRecord
public class CategoryPartRecord : ContentPartRecord {
...
}
public class CategoryPart : ContentPart<CategoryPartRecord> {
...
}
association record:
public class ContentMaterialCategoryRecord {
public virtual int Id { get; set; }
public virtual MaterialPartRecord MaterialPartRecord { get; set; }
public virtual CategoryPartRecord CategoryPartRecord { get; set; }
}
Now I need to select MaterialItems which are linked to certain category. So far I have this method to extract them. It works but I'm not sure that it is correct way to do this.
public IEnumerable<MaterialPart> GetMaterialsByCategory(int catId) {
var cs = new CategoriesService(_oServices);
CategoryPartRecord cat = cs.GetItem(catId).Record;
return _oServices.ContentManager
.Query(VersionOptions.Latest, _contentType)
.Join<CommonPartRecord>()
.OrderByDescending(cpr => cpr.PublishedUtc);
.List()
.Where(ci => ci.IsPublished())
.Select(ci => ci.As<MaterialPart>())
.Where(mp => mp.Categories.Contains(cat)); // < ---- ?
}
So my question is: what is correct way to select materials for required category, which produces optimal SQL query, as we simply need to inner join associated record table with required CategoryPartRecord_Id field value.
thaks!
In case, of M : N with pairing object, we can use QueryOver and subquery. The biggest benefit would be, that we recieve the plain set of material Items, which we can use for paging (Take(), Skip())
var session = ... // get curretn session
CategoryPartRecord category = null;
ContentMaterialCategoryRecord pair = null;
MaterialPartRecord material = null;
var subquery = QueryOver.Of<ContentMaterialCategoryRecord>(() => pair)
// now we will join Categories to be able to filter whatever property
.JoinQueryOver(() => pair.CategoryPartRecord, () => category)
// here is the filter
// there could be IN, >= <= ...
.Where(() => category.ID == 1)
// or
.WhereRestrictionOn(c => c.category.ID).IsIn(new[] {1, 2, 3})
...
// now we will return IDs of the Material we are interested in
.Select(x => pair.MaterialPartRecord.Id);
// finally the clean query over the Materials...
var listOfUsers = session.QueryOver<MaterialPartRecord>(() => material )
.WithSubquery
.WhereProperty(() => material.Id)
.In(subquery)
// paging
.Take(10)
.Skip(10)
.List<MaterialPartRecord>();
So, this will produce the most effective SQL Script, with one subselect, and clean select from material table
NOTE: similar stuff could be done even with LINQ. But QueryOver is NHibernate most native way I'd say. Anyhow, the principe - subquery to filter by category, and main query to load materials will remain the same. Only ONE SQL Select call

Class with multiple List Properties of same type but with different restrictions

Here's my problem: I have a class that have 2 list properties of the same class type (but with some different restriction as on how to be filled), let's say:
public class Team
{
[Key]
public int IDTeam { get; set; }
public string TeamName { get; set; }
public List<Programmer> Members { get; set; }
public List<Programmer> Leaders { get; set; }
public LoadLists(MyProjectDBContext db)
{
this.Members = db.Programmers.Where(p => p.IDTeam = this.IDTeam
&& (p.Experience == "" || p.Experience == null)).ToList();
this.Leaders = db.Programmers.Where(p => p.IDTeam = this.IDTeam
&& (p.Experience != null && p.Experience != "")).ToList();
}
}
public class Programmer
{
[Key]
public int IDProgrammer { get; set; }
[ForeignKey("Team")]
public int IDTeam { get; set; }
public virtual Team Team { get; set; }
public string Name { get; set; }
public string Experience { get; set; }
}
At some point, I need to take a list of Teams, with it's members and leaders, and for this I would assume something like:
return db.Teams
.Include(m => m.Members.Where(p => p.Experience == "" || p.Experience == null)
.Include(l => l.Leaders.Where(p => p.Experience != null && p.Experience != "")
.OrderBy(t => t.TeamName)
.ToList();
And, of course, in this case I would be assuming it wrong (cause it's not working at all).
Any ideas on how to achieve that?
EDIT: To clarify a bit more, the 2 list properties of the team class should be filled according to:
1 - Members attribute - Should include all related proggramers with no experience (proggramer.Experience == null or "");
2 - Leaders attribute - Should include all related proggramers with any experience (programmer.Experiente != null nor "");
EDIT 2: Here's the MyProjectDbContext declaration:
public class MyProjectDBContext : DbContext
{
public DbSet<Team> Teams { get; set; }
public DbSet<Programmer> Programmers { get; set; }
}
You are talking about EntityFramework (Linq to entities) right? If so, Include() is a Method of Linq To Entities to include a sub-relation in the result set. I think you should place the Where() outside of the Inlcude().
On this topic you'll find some examples on how to use the Include() method.
So I suggest to add the Include()'s first to include the relations "Members" and "Leaders" and then apply your Where-Statement (can be done with one Where()).
return db.Teams
.Include("Team.Members")
.Include("Team.Leaders")
.Where(t => string.IsNullOrWhitespace(t.Members.Experience) ... )
What is unclear to me is your where criteria and your use-case at all as you are talking of getting a list of Teams with Leaders and Members. May above example will return a list of Teams that match the Where() statement. You can look though it and within that loop you can list its members and leaders - if that is the use-case.
An alternative is something like this:
return db.Members
.Where(m => string.IsNullOrWhitespace(m.Experience))
.GroupBy(m => m.Team)
This get you a list of members with no experience grouped by Team. You can loop the groups (Teams) and within on its members. If you like to get each team only once you can add a Distinct(m => m.Team) at the end.
Hope this helps. If you need some more detailed code samples it would help to understand your requirements better. So maybe you can say a few more words on what you expect from the query.
Update:
Just read our edits which sound interesting. I don't think you can do this all in one Linq-To-Entities statement. Personally I would do that on the getters of the properties Members and Leaders which do their own query (as a read-only property). To get performance for huge data amount I would even do it with SQL-views on the DB itself. But this depends a little on the context the "Members" and "Leaders" are used (high frequent etc).
Update 2:
Using a single query to get a table of teams with sublists for members and leaders I would do a query on "Programmers" and group them nested by Team and Experience. The result is then a list of groups (=Teams) with Groups (Experienced/Non-experience) with Programmers in it. The final table then can be build with three nested foreach-Statements. See here for some grouping examples (see the example "GroupBy - Nested").
Whenever you fetch entities, they will be stored in the context -- regardless of the form they are "selected" in. That means you can fetch the teams along with all the necessary related entities into an anonymous type, like this:
var teams =
(from team in db.Teams
select new {
team,
relatedProgrammers = team.Programmers.Where(
[query that gets all leaders OR members])
}).ToList().Select(x => x.team);
It looks like we're throwing away the relatedProgrammers field here, but those Programmer entities are still in memory. So, when you execute this:
foreach (var team in teams) team.LoadLists(db);
...it will populate the lists from the programmers that were already fetched, without querying the database again (assuming db is the same context instance as above).
Note: I haven't tested this myself. It's based on a similar technique shown in this answer.
EDIT - Actually, it looks like your "leaders" and "members" cover all programmers associated with a team, so you should be able to just do Teams.Include(t => t.Programmers) and then LoadLists.

RavenDB index for nested query

I'm pretty new to RavenDB and am struggling to find a solution to the following:
I have a collection called ServiceCalls that look like this:
public class ServiceCall
{
public int ID { get; set; }
public string IncidentNumber { get; set; }
public string Category { get; set; }
public string SubCategory { get; set; }
public DateTime ReportedDateTime { get; set; }
public string Block { get; set; }
public decimal Latitude { get; set; }
public decimal Longitude { get; set; }
}
I have an index named ServiceCalls/CallsByCategory that looks like this:
Map = docs => from doc in docs
select new
{
Category = doc.Category,
CategoryCount = 1,
ServiceCalls = doc,
};
Reduce = results => from result in results
group result by result.Category into g
select new
{
Category = g.Key,
CategoryCount = g.Count(),
ServiceCalls = g.Select(i => i.ServiceCalls)
};
So the output is:
public class ServiceCallsByCategory
{
public string Category { get; set; }
public int CategoryCount { get; set; }
public IEnumerable<ServiceCall> ServiceCalls { get; set; }
}
using this query everything works as it should
var q = from i in session.Query<ServiceCallsByCategory>("ServiceCalls/CallsByCategory") select i
Where I am absolutely lost is writing an index that would allow me to query by ReportedDateTime. Something that would allow me to do this:
var q = from i in session.Query<ServiceCallsByCategory>("ServiceCalls/CallsByCategory")
where i.ServiceCalls.Any(x=>x.ReportedDateTime >= new DateTime(2012,10,1))
select i
Any guidance would be MUCH appreciated.
A few things,
You can't have a .Count() method in your reduce clause. If you look closely, you will find your counts are wrong. As of build 2151, this will actually throw an exception. Instead, you want CategoryCount = g.Sum(x => x.CategoryCount)
You always want the structure of the map to match the structure of the reduce. If you're going to build a list of things, then you should map a single element array of each thing, and use .SelectMany() in the reduce step. The way you have it now only works due to a quirk that will probably be fixed at some point.
By building the result as a list of ServiceCalls, you are copying the entire document into the index storage. Not only is that inefficient, but it's unnecessary. You would do better keeping a list of just the ids. Raven has an .Include() method that you can use if you need to retrieve the full document. The main advantage here is that you are guaranteed to have the most current data for each item you get back, even if your index results are still stale.
Putting all three together, the correct index would be:
public class ServiceCallsByCategory
{
public string Category { get; set; }
public int CategoryCount { get; set; }
public int[] ServiceCallIds { get; set; }
}
public class ServiceCalls_CallsByCategory : AbstractIndexCreationTask<ServiceCall, ServiceCallsByCategory>
{
public ServiceCalls_CallsByCategory()
{
Map = docs => from doc in docs
select new {
Category = doc.Category,
CategoryCount = 1,
ServiceCallIds = new[] { doc.ID },
};
Reduce = results => from result in results
group result by result.Category
into g
select new {
Category = g.Key,
CategoryCount = g.Sum(x => x.CategoryCount),
ServiceCallIds = g.SelectMany(i => i.ServiceCallIds)
};
}
}
Querying it with includes, would look like this:
var q = session.Query<ServiceCallsByCategory, ServiceCalls_CallsByCategory>()
.Include<ServiceCallsByCategory, ServiceCall>(x => x.ServiceCallIds);
When you need a document, you still load it with session.Load<ServiceCall>(id) but Raven will not have to make a round trip back to the server to get it.
NOW - that doesn't address your question about how to filter the results by date. For that, you really need to think about what you are trying to accomplish. All of the above would assume that you really want every service call shown for each category at once. Most of the time, that's not going to be practical because you want to paginate results. You probably DON'T want to even use what I've described above. I am making some grand assumptions here, but most of the time one would filter by category, not group by it.
Let's say you had an index that just counts the categories (the above index without the list of service calls). You might use that to display an overview screen. But you wouldn't be interested in the documents that were in each category until you clicked one and drilled into a details screen. At that point, you know which category you're in, and you can filter by it and reduce to a date range without a static index:
var q = session.Query<ServiceCall>().Where(x=> x.Category == category && x.ReportedDateTime >= datetime)
If I am wrong and you really DO need to show all documents from all categories, grouped by category, and filtered by date, then you are going to have to adopt an advanced technique like the one I described in this other StackOverflow answer. If this is really what you need, let me know in comments and I'll see if i can write it for you. You will need Raven 2.0 to make it work.
Also - be very careful about what you are storing for ReportedDateTime. If you are going to be doing any comparisons at all, you need to understand the difference between calendar time and instantaneous time. Calendar time has quirks like daylight savings transitions, time zone differences, and more. Instantaneous time tracks the moment something happened, regardless of who's asking. You probably want instantaneous time for your usage, which means either using a UTC DateTime, or switching to DateTimeOffset which will let you represent instantaneous time without losing the local contextual value.
Update
I experimented with trying to build an index that would use that technique I described to let you have all results in your category groups but still filter by date. Unfortunately, it's just not possible. You would have to have all ServiceCalls grouped together in the original document and express it in the Map. It doesn't work the same way at all if you have to Reduce first. So you really should just consider simple query for ServiceCalls once you are in a specific Category.
Could you add ReportedDateTime to the Map and aggregate it in the Reduce? If you only care about the max per category, something like this should be sufficient.
Map = docs => from doc in docs
select new
{
Category = doc.Category,
CategoryCount = 1,
ServiceCalls = doc,
ReportedDateTime
};
Reduce = results => from result in results
group result by result.Category into g
select new
{
Category = g.Key,
CategoryCount = g.Sum(x => x.CategoryCount),
ServiceCalls = g.Select(i => i.ServiceCalls)
ReportedDateTime = g.Max(rdt => rdt.ReportedDateTime)
};
You could then query it just based on the aggregated ReportedDateTime:
var q = from i in session.Query<ServiceCallsByCategory>("ServiceCalls/CallsByCategory")
where i.ReportedDateTime >= new DateTime(2012,10,1)
select i

Categories