RavenDB Map Reduce Distinct Index

RavenDB Map Reduce Distinct Index - c#

We have an object with nested properties which we want to make easily searchable. This has been simple enough to achieve but we also want to aggregate information based on multiple fields. In terms of the domain we have multiple deals which have the same details with the exception of the seller. We need consolidate these as a single result and show seller options on the following page. However, we still need to be able to filter based on the seller on the initial page.
We attempted something like the below, to try to collect multiple sellers on a row but it contains duplicates and the creation of the index takes forever.
Map = deals => deals.Select(deal => new
{
Id = deal.ProductId,
deal.ContractLength,
Provider = deal.Provider.Id,
Amount = deal.Amount
});
Reduce = deals => deals.GroupBy(result => new
{
result.ProductId,
result.ContractLength,
result.Amount
}).Select(result => new
{
result.Key.ProductId,
result.Key.ContractLength,
Provider = result.Select(x => x.Provider).Distinct(),
result.Key.Amount
});
I'm not sure this the best way to handle this problem but fairly new to Raven and struggling for ideas. If we keep the index simple and group on the client side then we can't keep paging consistent.
Any ideas?

You are grouping on the document id. deal.Id, so you'll never actually generate a reduction across multiple documents.
I don't think that this is intended.

Related

Reusing a where array

Say I needed to do a whole bunch of queries from various tables like so
var weights = db.weights.Where(x => ids.Contains(x.ItemId)).Select(x => x.weight).ToList();
var heights = db.heights.Where(x => ids.Contains(x.ItemId)).Select(x => x.height).ToList();
var lengths = db.lengths.Where(x => ids.Contains(x.ItemId)).Select(x => x.length).ToList();
var widths = db.widths.Where( x => ids.Contains(x.ItemId)).Select(x => x.width ).ToList();
Okay, its really not that stupid in reality but its just for illustrating the question. Basically, that array "ids" gets sent to the database 4 times in this example. I was thinking I could save some bandwidth by just sending it once. Is it possible to do that? Sorta like
db.SetTempVariable("ids", ids);
var weights = db.weights.Where(x => db.TempVariable["ids"].Contains(x.ItemId)).Select(x => x.weight).ToList();
var heights = db.heights.Where(x => db.TempVariable["ids"].Contains(x.ItemId)).Select(x => x.height).ToList();
var lengths = db.lengths.Where(x => db.TempVariable["ids"].Contains(x.ItemId)).Select(x => x.length).ToList();
var widths = db.widths.Where( x => db.TempVariable["ids"].Contains(x.ItemId)).Select(x => x.width ).ToList();
db.DeleteTempVariable("ids");
I'm just imagining the possible syntax here. In essence, SetTempVariable would send the data to the database and db.TempVariables["ids"] would be just like a dummy object to use in expressions that really only contains a reference to previously sent data and then the database magically understands this and reuses the list of ids i sent it instead of me sending it again and again.
So, how can I do that?

Well, this is more a database design problem than anything. A properly designed database would have one table that contains weights, heights, lengths and widths for every item (or "id" as you call it), so one query on the item returns everything at once.
I'm reluctant to suggest bandaid fixes for the broken database design you're using, because you really should just fix that, but you'll find a large improvement in performance if you open a transaction first and run all 4 of your queries in it. Or just join the tables on id (they seem the same?), and then your queries become one query.
To answer your actual question, and again you're barking up the wrong tree here, that's what temp tables are. You can upload your data to a temp table and then join it against your other table(s).

context include sometimes returns list items in no particular order

In my ASP.Core 2.1 Web App, I have the 3 Models,
Profile which has Many Invoices,
Invoices which has Many Invoice Statuses
I retrieve them from the Db e.g.
_context.Invoices
.Include(st => st.InvoiceStatuses)
.FirstOrDefault(iv => iv.Id == invoiceId);
or sometimes
_context.Invoices
.Include(pr => pr.Profile)
.Include(st => st.InvoiceStatuses)
.FirstOrDefault(iv => iv.Id == invoiceId);
From this I expect to get a specific invoice and all related InvoiceStatuses in the order in which they were created(Db index order essentialy)
Most of the time this is indeed the case.
However, occasionally, I add a new Invoice record and initial invoice status and just a few of the invoices have their related Invoice Statuses List in a random / unexpected order. e.g index 10 12 18 16
I can get round this by breaking it down in to two queries for invoice and their statuses but was hoping someone could perhaps give some insight into what might be happening?
It would be easier if the problem happened consistently but if you delete a record (Sometimes needs to be a couple of records). You can then go on and add multiple records before the problem might potentially appear again.
I get the same problem when returning all Invoices.ToList() and each ones .Include related data but was trying to focus on the most simple scenario first.
I have not turned on LazyLoading or used Virtual keywords but not sure if this would matter.

To continue from my comment....
First/FirstOrDefault should always be done with an OrderBy clause unless you truly don't care which one you will get.
Ordering in general is a display and business logic concern. Entities are a view of data.
In cases where you want to display data in order you should consider composing view models for the data to display, then use .Select() with applicable children in appropriate order. For instance if I want to select an invoice and list it's statuses in the order they were added. (assumed by the auto-increment Id order)
var invoice = _context.Invoices.OrderBy(x => x.Id)
.Select(x => new InvoiceViewModel
{
Id = x.Id,
// ... Fields the view needs to know about
InvoiceStatuses = x.InvoiceStatuses.OrderBy(s => s.Id)
.Select(s => s.StatusText)
.ToList()
}).FirstOrDefault();
So something like that would use the Invoice OrderBy to find the first applicable Invoice (by ID order) then select the fields we care about into a view model. For the Invoice Statuses it orders them by their Id and selects the StatusText to provide the view a list of Statuses as strings. Alternatively you could select an InvoiceStatusViewModel to return the Status Text, Status ID, etc. depending on what you view wanted.
Alternatively if you are selecting the data to be consumed on the spot for some business logic, you don't need to declare the view model classes, simply use anonymous types:
var invoice = _context.Invoices.OrderBy(x => x.Id)
.Select(x => new
{
x.Id,
// ... Fields the view needs to know about
InvoiceStatuses = x.InvoiceStatuses.OrderBy(s => s.Id)
.Select(s => new
{
s.Id,
s.StatusText
})
.ToList()
}).FirstOrDefault();
This gives you the data you might need to consume, in order, but as anonymous types you cannot return this data outside of the function scope such as to a view.
The technique of using .Select() to reduce results helps lead to more efficient queries as you can utilize all forms of aggregate methods so that rather than returning everything and then writing logic to iterate over, you can utilize Max, Min, Sum, Any, etc. to compose more efficient queries that run faster, and return less data over the wire.

Accessing info from CommonPart is extremely slow?

I'm new to Orchard and this must be something involving how the underlying data is stored.
The joining with CommonPart seems fast enough, like this:
var items = _contentManager.Query<MyUserPart, MyUserPartRecord>("someTypeName")
.ForVersion(VersionOptions.Published)
.Join<CommonPartRecord>().List().ToList();
That runs fairly fast. But whenever I try accessing some field in CommonPart, it runs extremely slow like this:
var items = _contentManager.Query<MyUserPart, MyUserPartRecord>("someTypeName")
.ForVersion(VersionOptions.Published)
.Join<CommonPartRecord>().List()
//access some field from commonpart
.Select(e => new {
User = e.As<CommonPart>().Owner.UserName
}).ToList();
The total data is just about 1200 items, and the time it needs is about 5 seconds, it cannot be slow like that. For a simple SQL query run in background, it should take a time of about 0.5 second or even less than.
I've tried investigating the Orchard's source code but found nothing that could be the issue. Everything seems to go into a blackbox at the accessing point of IContent. I hope someone here could give me some suggestion to diagnose and solve this hard issue. Thanks!
Update:
I've tried debugging a bit and seen that the following method is hit inside the DefaultContentManager:
ContentItem New(string contentType) { ... }
Well that's really interesting, the query is just asking for data without modifying, inserting and updating anything. But that method being hit shows that something's wrong here.
Update:
With #Bertrand Le Roy's comment, I've tried the following codes with QueryHint but looks like it does not change anything:
var items = _contentManager.Query<MyUserPart, MyUserPartRecord>("someTypeName")
.ForVersion(VersionOptions.Published)
.Join<CommonPartRecord>()
.WithQueryHints(new QueryHints().ExpandParts<CommonPart>())
.List()
//access some field from commonpart
.Select(e => new {
User = e.As<CommonPart>().Owner.UserName
}).ToList();
and this (without .Join)
var items = _contentManager.Query<MyUserPart, MyUserPartRecord>("someTypeName")
.ForVersion(VersionOptions.Published)
.WithQueryHints(new QueryHints().ExpandParts<CommonPart>())
.List()
//access some field from commonpart
.Select(e => new {
User = e.As<CommonPart>().Owner.UserName
}).ToList();

Accessing the Owner property from your Select causes the lazy loader in CommonPartHandler to ask the content manager to load the user content item: _contentManager.Get<IUser>(part.Record.OwnerId). This happens once per content item result from your query, so results in a select n+1 where n = 1200 according to your question.
There are at least two ways of avoiding that:
You can use HQL and craft a query that gives you everything you need up front in 1 operation.
You can make a 1st content manager query to get the set of owner ids, and then
make a second content manager query for those Ids and get everything you need with a total of 2 queries instead of 1201.

Linq Take on Include?

I have a query with a lot of includes, and I'm wondering if I can do Takes on some of the includes.
For example, here's one of my queries, with the (illegal) Take illustrating what I want to do.
var primaryLocation = context.Locations
.Include("PhoneNumbers")
.Include("Invoices").Take(50)
.Include("Invoices.Items")
.Include("Schedules")
.Include("Staffs")
.SingleOrDefault(d => d.Id == locationId);
Currently the only way I can think to do it would be like so:
var primaryLocation = context.Locations
.Include("Invoices")
.Include("Etc")
.SingleOrDefault(d => d.Id == locationId);
primaryLocation.Invoices = primaryLocation.Invoices.Take(50).ToList();
I'd prefer not doing it that way, since means pulling back the entire Invoice list from the database, which I don't need.
Is there a handy way to build the Take into my query?

Seems like have two conflicting criteria for what you're doing. I'm guessing here, but you didn't leave us all that much to go on.
Since your primaryLocation.Invoices = primaryLocation.Invoices.Take(50).ToList(); statement only makes use of 1 of your includes, I'm assuming you're doing more things with your primaryLocation than what you've shown us. This leads me to believe that you want that primaryLocation to include all of the stuff. And then you seem not to want more than those 50, so that's not all of the stuff after all then... To me this is a contradiction. If you require all, you should include it all.
If you want your 50 invoices selection specifically you could get those separately in its own query. I use NHibernate myself, so I'm not sure of the syntax for future's in Entity framework, but if you want to ask for multiple things with only 1 round-trip to the server, in NHibernate you can make a series of queries into futures to allow this. I expect Entity framework has something similar.
In short, what I'm suggesting is that if you want primaryLocation to include all of your data, then that's what you'll get, and if you're after more specific information with filters like Take, then you might want to query more specifically.

Use projection instead of blindly calling Include if you don't want everything:
var primaryLocation = context.Locations
.Select(location => new {
Id = location.Id,
Name = location.Name,
// ... other properties needed on the front end
RecentInvoices = location.Invoices
// really should sort if you're only taking 50
.OrderByDescending(invoice => invoice.CreatedAt)
.Take(50),
AllPhoneNumbers = location.PhoneNumbers,
})
.SingleOrDefault(location => location.Id == locationId);
You could use projection to get just the invoice information you need too, I just didn't want to over-complicate the example.
Using this method you get exactly the data you want without adding confusion. It also allows you to name your properties (such as RecentInvoices above) to add more meaning.

Given a List of objects, how can I find the index of any particular object?

Ok, there is probably a better way of doing this, and if this is please let me know!
I have a List of Users (an entity framework entity within my system). Each User object has certain properties. I need to rank all users based on these properties one by one (who made the most sales etc etc)
At the moment I am thinking that the best way to do this is to reorder the List using LINQ's OrderBy extension method on the property that I am ranking on into a new list. Once this is done I should be able to get the position of the user within this newly ordered List, which will be a indication of their rank. (I know its going to get more complex when users have the same value for said properties)
The question is, how?
Thanks!

var users = new List<User> { ... };
var userRanks = users
.OrderBy(user => user.Prop1)
.ThenBy(user => user.Prop2)
.Select((user, index) => new { User = user, Rank = index + 1 });

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

RavenDB Map Reduce Distinct Index - c#

You are grouping on the document id. deal.Id, so you'll never actually generate a reduction across multiple documents. I don't think that this is intended.

Related

Reusing a where array

context include sometimes returns list items in no particular order

Accessing info from CommonPart is extremely slow?

Linq Take on Include?

Given a List of objects, how can I find the index of any particular object?

Categories

Resources