Querying by near, sorting, then paging - c#

I'm using the geospatial "near" search in MongoDB (using the C# driver) to return homes within 25 miles of a given lat/long. This returns the homes sorted by proximity to the lat/long and works great.
However, I want to add in sorting (on other fields such as home price) and paging and here is where I'm getting stuck. To work correctly, it would need to figure out which homes were within 25 miles of the lat/long, then sort those results (let's say based on price), and then take a "page" of 10 results.
Below is what I have so far, the issue with it is it takes a page of results (based on the proximity sort) and then sorts that page of 10 results by what I set in "SetSortOrder" rather than sorting the entire result near the lat/long, so each page of 10 results is sorted in itself.
var coordinates = find.GetCoordinates();
var near = Query.Near("Coordinates", coordinates.Latitude,
coordinates.Longitude,
find.GetRadiansAway(), false);
var query = Collection().Find(near);
query.Skip = find.GetSkip();
query.Limit = find.GetLimit();
query.SetSortOrder(new string[] { "Price" });
var results = query.ToArray();

It is right behavior, because $near by default return result sorted by distance. And sorting done internally in $near operator, so you can't change it.
db.places.find( { loc : { $near :
[50,50] } } )
The above query finds the closest
points to (50,50) and returns them
sorted by distance (there is no need
for an additional sort parameter)
So in you example Price it's second sort field that sort data within result sorted by distance.
Workgraund is load entire result of Query.Near and than sort it by whatever you want field on the client.

Related

Sitecore query, Is it possible to calculate sum of a Integer field

I would like to know if its possible to do mathematical operation (e.g. Sum ) in sitecore fast query or any other way.
I have 100s of items with field 'Money spend' data type 'Integer'. I want to know the fast way to calculate the sum of this field for a specific person/user.
Here is what I am doing, I am using fast query to get the items and then calculating the sum.
var searchStr = "{30218229-CFA8-4BC3-9F01-01E3E6469E51}";
var query = string.Format("fast:/sitecore/content/Intranet/User/Detail/*[#Active ='1']//*[#Profile Id=\"%{0}%\"]", searchStr);
var items = Sitecore.Context.Database.SelectItems(query);
//Calculate sum
var sum = items.Aggregate(0, (x, y) => x + GeneralHelper.ConvertToInt16(y["Money spend"]));
I want to know how I can make the sum calculation process fast?
I think the best way is to use indexes (as Mark already mentioned):
create a custom index for your users and include your "money spend" value in it (and also the 'active' and 'profile id' as you are querying on those). Make sure the "money spend" is "stored".
create a custom class deriving from SearchResultItem to include the "money spend" field as a property
use the contentsearch api to query your users as you did with the fast query (index will be faster) and use your custom class that you just created to gather the results
use Linq to calculate what you need

Get Specific Range of List Items (LINQ)

I have this block of code I'm working with :
// get the collection of librarys from the injected repository
librarySearchResults = _librarySearchRepository.GetLibraries(searchTerm);
// map the collection into a collection of LibrarySearchResultsViewModel view models
libraryModel.LibrarySearchResults =
librarySearchResults.Select(
library =>
new LibrarySearchResultsViewModel
{
Name = library.Name,
Consortium = library.Consortium,
Distance = library.Distance,
NavigateUrl = _librarySearchRepository.GetUrlFromBranchId(library.BranchID),
BranchID = library.BranchID
}).ToList();
All this does is take the results of GetLibraries(searchTerm), which returns a list of LibrarySearchResult objects, and maps them over to a list of LibrarySearchResultsViewModel's.
While this works well for small result sets, once I get up into the 1,000's, it really starts to drag, taking about 12 seconds before it finishes the conversion.
My question :
Since I'm using paging here, I really only need to display a fraction of the data that's being returned in a large result set. Is there a way to utilize something like Take() or GetRange(), so that the conversion only happens for the records I need to display? Say out of 1,000 records, I only want to get records 20 through 40, and convert them over to the view models.
I'm all for any suggestions on improving or refactoring this code as well.
Use Skip and Take:
// take records from 20 to 40
var records = librarySearchResults.Skip(20).Take(20);
You can easily paginate it (you'll need page and pageSize).
On the other hand you're using ToList there, consider using just IEnumerable, conversion to list can eat up lots of time, especially for large data set.
you can use Skip() and Take() together to enable paging.
var idx = // set this based on which page you're currently generating
librarySearchResults.Skip(idx * numitems).Take(numitems).Select(lib => ...);

Is there any way to loop through my sql results and store certain name/value pairs elsewhere in C#?

I have a large result set coming from a pretty complex SQL query. Among the values are a string which represents a location (that will later help me determine the page location that the value came from), an int which is a priority number calculated for each row based on other values from the row, and another string which contains a value I must remember for display later.
The problem is that the sql query is so complex (it has UNIONS, JOINS, and complex calculations with aliases) that I can't logically fit anything else into it without messing with the way it works.
Suffice it to say, though, after the query is done and the calculations performed, I need something that perhaps aggregate functions might solve, but that IS NOT an option, as all the columns do not come from other aggregate functions.
I have been wracking my brain for days now as to how I can iterate through the results, store a pair of values in a list (or two separate lists tied together somehow) where one value is the sum of all the priority values for each location and the other value is a distinct location value (i.e., as the results are looped through, it will not create another list item with the same location value that has been used before, HOWEVER, it does still need the sum of all of the other priority values from locations that ARE identical). Also, the results need to be ordered by priority in Descending order (hence the problem with using two lists).
EXAMPLE:
EDIT: I forgot, the preserved value should be the value from the row with the highest priority from the sql query.
If I had the following results:
location priority value
--------------------------------------------------------------------------------
page1 1 some text!
page2 3 more text!
page2 4 even more text!
page3 3 text again
page3 1 text
page3 1 still more text!
page4 6 text
If I was able to do what I wanted I would be able to achieve something like this after iteration (and in this order):
location priority value
--------------------------------------------------------------------------------
page2 7 even more text!
page4 6 text
page3 5 text again
page1 1 some text!
I have done research after research after research but absolutely nothing really even gets close to solving this dilemma.
Is what I'm asking too tough for even the powerful C# language?
THINGS I HAVE CONSIDERED:
Looping through the sql results and checking each location for repeats, adding together all priority values as I go, and storing these two plus value in two or three separate lists.
Why I still need help
I can't use a foreach because the logic didn't pan out, and I can't use a for loop because I can't access an IEnumerable (or whatever type it is that stores what's returned from Database.Open.Query() by index. (this makes sense, of course). Also, I need to sort on priority, but can't get one list out of sync with the others.
Using LINQ to select and store what I need
Why I still need help
I don't know LINQ (at all!) mainly because I don't understand lambda expressions (no matter HOW MUCH I read up about it).
Using an instantiated class to store the name/value pairs
Why I still need help
Not only do I expect sorting on this sort of thing to be impossible, and while I do now how to use .cs files in my C#.net webpages with WebMatrix environment, I have mainly only ever used static classes and would also need a little refresher course on constructors and how to set this up appropriately.
Somehow fitting this functionality into the already sizeable and complex SQL query
Why I still need help
While this is probably where I would ideally like this functionality to be, I stress again that this IS NOT AN OPTION. I have tried using aggregate functions, but only get an error saying how not all the other columns come from aggregate functions.
Making another query based on values from the first query's result set
Why I still need help
I can't select distinct results based on only one column (i.e., location) alone.
Assuming I could get the loop logic correct, storing the values in a 3 dimensional array
Why I still need help
I can't declare the array, because I do not know all of its dimensions before I need to use it.
Your post has amazed me in a number of ways like saying to 'mostly using static classes' and 'expecting instantiate a class/object to be impossible'.. really strange things you say. I can only respond in a quote from Charles Babbage:
I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
Anyways.. As you say you find lambdas hard, let's trace the problem in the classic 'manual' way.
Let's assume you have a list of ROWS that contains LOCATIONS and PRIORITIES.
List<DataRow> rows = .... ; // datatable, sqldatareader, whatever
You say you need:
list of unique locations
a "list" of locations paired up with summed up priorites
Let's start with the first objective.
To gather a list of unique 'values', a HashSet is just perfect:
HashSet<string> locations = new HashSet<string>();
foreach(var row in rows)
locations.Add( (string)rows["LOCATION"] );
well, and that's all. After that, the locations hashset will only remember all the unique locations. The "Add" does not result in duplicate elements. The HashSet checks and "uniquifies" all values that are put inside it. Small tricky thing is the hashset does not have the [index] operator. You'll have to enumerate the hashset to get the values:
foreach(string loc in locations)
{
Console.WriteLine(loc);
}
or convert/rewrite it to a list:
List<string> locList = new List<string>(locations);
Console.WriteLine(locList[2]); // of course, assuming there were at least three..
Let's get to the second objective.
To gather a list of values related to some thing behaving like a "logical key", a Dictionary<Key,Val> may be useful. It allows you to store/associate a "value" with some "key", ie:
Dictionary<string, double> dict = new Dictionary<string, double>();
dict["mamma"] = 123.45;
double d = dict["mamma"]; // d == 123.45
    dict["mamma"] += 101; // possible!
double e = dict["mamma"]; // d == 224.45
However, it has a behavior of happily throwing exceptions when you try to read from an unknown key:
Dictionary<string, double> dict = new Dictionary<string, double>();
dict["mamma"] = 123.45;
double d = dict["daddy"]; // throws KeyNotBlarghException
    dict["daddy"] += 101; // would throw too! tries to read the old/current value!
So, one have to be very careful with it with "keys" that it does not yet know. Fortunatelly, you can always ask the dictionary if it already knows a key:
Dictionary<string, double> dict = new Dictionary<string, double>();
dict["mamma"] = 123.45;
bool knowIt = dict.ContainsKey("daddy"); // == false
So you can easily check-and-initialize-when-unknown:
Dictionary<string, double> dict = new Dictionary<string, double>();
bool knowIt = dict.ContainsKey("daddy"); // == false
if( !knowIt )
dict["daddy"] = 5;
dict["daddy"] += 101; // now 106
So.. let's try summing up the priorities location-wise:
Dictionary<string, double> prioSums = new Dictionary<string, double>();
foreach(var row in rows)
{
string location = (string)rows["LOCATION"];
double priority = (double)rows["PRIORITY"];
if( ! prioSums.ContainsKey(location) )
// make sure that dictionary knows the location
prioSums[location] = 0.0;
prioSums[location] += priority;
}
And, really, that's all. Now the prioSums will know all locations and all sums of priorities:
var sss = prioSums["NewYork"]; // 9123, assuming NewYork was some location
However, that'd be quite useless to have to hardcode all locations. Hence, you also can ask the dictionary about what keys does it curently know
foreach(string key in prioSums.Keys)
Console.WriteLine(key);
and you can immediatelly use it:
foreach(string key in prioSums.Keys)
{
Console.WriteLine(key);
Console.WriteLine(prioSums[key]);
}
that should print all locations with all their sums.
You might already noticed an interesting thing: the dictionary can tell you what keys has it remembered. Hence, you do not need the HashSet from the first objective. Simply by summing up the priorities inside the Dictionary, you get the uniquized list of location by free: just ask the dict for its keys.
EDIT:
I noticed you've had a few more requests (like sort-descending or find-highest-prio-value), but I think I'll leave them for now. If you understand how I used a dictionary to collect the priorities, then you will easily build a similar Dictionary<string,string> to collect the highest-ranking value for a location. And the 'descending order' is done very easily if only you take the values out of dictionary and sort them as a i.e. List.. So I'll skip that for now.. This text got far tl;dr already I think :)
LINQ is really the tool to use for this kind of problems.
Suppose you have a variable pages which is an IEnumerable<Page>, where Page is a class with properties location, priority and value you could do
var query = from page in pages
group page by page.location into grp
select new { location = grp.Key,
priority = grp.Sum(page => page.priority),
value = grp.OrderByDescending(page => page.priority)
.First().value
}
You say you don't understand LINQ, so let me try to begin explain this statement.
The rows are group by location, which results in 4 groups of pages of which page.location is the key:
location priority value
--------------------------------------
page1 1 some text!
page2 3 more text!
4 even more text!
page3 1 text
1 still more text!
3 text again
page4 6 text
The select loops through these 4 groups and for each group it creates an anonymous type with 3 properties:
location: the key of the group
priority: the sum of priorities in one group
value: the first value in one group when its pages are sorted by priority in descending order.
The lamba expressions are a way to express which property should be used for a LINQ function like Sum. In short they say "transform page to page.priority": page => page.priority.
You want these new rows in descending order of priority, so finally you can do
result = query.OrderByDescending(x => x.priority).ToList();
The x is just an arbitrary placeholder representing one item in the collection in hand, query (likewise in the query above page could have been any word or character).

Limit Number of Results being returned in a List from Linq

I'm using Linq/EF4.1 to pull some results from a database and would like to limit the results to the (X) most recent results. Where X is a number set by the user.
Is there a way to do this?
I'm currently passing them back as a List if this will help with limiting the result set. While I can limit this by looping until I hit X I'd just assume not pass the extra data around.
Just in case it is relevant...
C# MVC3 project running from a SQL Server database.
Use the Take function
int numberOfrecords=10; // read from user
listOfItems.OrderByDescending(x => x.CreatedDate).Take(numberOfrecords)
Assuming listOfItems is List of your entity objects and CreatedDate is a field which has the date created value (used here to do the Order by descending to get recent items).
Take() Function returns a specified number of contiguous elements from the start of a
sequence.
http://msdn.microsoft.com/en-us/library/bb503062.aspx
results = results.OrderByDescending(x=>x.Date).Take(10);
The OrderByDescending(...) will sort items by your date/time property (or w/e logic you want to use to get most recent) and Take(...) will limit to first x items (first being most recent, thanks to the ordering).
Edit: To return some rows not starting at the first row, use Skip():
results = results.OrderByDescending(x=>x.Date).Skip(50).Take(10);
Use Take(), before converting to a List. This way EF can optimize the query it creates and only return the data you need.

NHibernate - How do I use a sum project on paged results

I'm trying to use paging in conjunction with a sum projection to get a sum of the values in a column for just the page of results I'm interested in. I'm using .NET, C# and NHibernate 3.1
I have an ICriteria to start with which is related to all rows from the associated db table.
I'm then doing the following to get a version with the first page (say, 10 items out of 40):
ICriteria recordsCriteria = CriteriaTransformer.Clone(criteria);
recordsCriteria.SetFirstResult(0);
recordsCriteria.SetMaxResults(10);
I'm using this ICriteria for something else so I then create two further clones:
ICriteria totalAggCriteria = CriteriaTransformer.Clone(criteria);
ICriteria pageAggCriteria = CriteriaTransformer.Clone(recordsCriteria);
If I take a look inside these two new ones the first has 40 items in and the second has 10 - exactly what I want.
Let's say the objects coming back from the DB have a column called "ColA" and it's of type Int32.
From this, I want the sum of all 40 ColA values and the sum of the first 10 ColA values.
To get the sum of all 40 ColA values, I do the following:
totalAggCriteria.SetProjection(NHibernate.Criterion.Projections.Sum("ColA"));
var totalSum = totalAggCriteria.UniqueResult();
The value in totalSum is correct.
To get the sum of the first 10 ColA values, I'm trying the following:
pageAggCriteria.SetProjection(NHibernate.Criterion.Projections.Sum("ColA"));
vat pageSum = pageAddCriteria.UniqueResult();
However, this gives me the same value as the previous one - for all 40 ColA values.
I've also tried the following but it gives the same result:
pageAggCriteria.SetProjection(NHibernate.Criterion.Projections.Sum(column));
pageAggCriteria.SetFirstResult(firstResult.Value);
pageAggCriteria.SetMaxResults(pageSize.Value);
pageSum = pageAggCriteria.UniqueResult();
And also:
pageAggCriteria.SetFirstResult(firstResult.Value);
pageAggCriteria.SetMaxResults(pageSize.Value);
pageAggCriteria.SetProjection(NHibernate.Criterion.Projections.Sum(column));
pageSum = pageAggCriteria.UniqueResult();
Can anyone give an idea on where I'm going wrong and how I can actually get the sum of the ColA values in the first 10 results?
Thanks
Probably easiest to do that sum client side. The aggregate function is operating on the whole table. What you are trying to do is run the aggregate function against the paged result which I don't think is possible with NH.
In other words, you want select sum(colA) from (select top 10 ...) but that criteria will give you select top 10 sum(colA) from ...)

Categories