Get Specific Range of List Items (LINQ)

Get Specific Range of List Items (LINQ) - c#

I have this block of code I'm working with :
// get the collection of librarys from the injected repository
librarySearchResults = _librarySearchRepository.GetLibraries(searchTerm);
// map the collection into a collection of LibrarySearchResultsViewModel view models
libraryModel.LibrarySearchResults =
librarySearchResults.Select(
library =>
new LibrarySearchResultsViewModel
{
Name = library.Name,
Consortium = library.Consortium,
Distance = library.Distance,
NavigateUrl = _librarySearchRepository.GetUrlFromBranchId(library.BranchID),
BranchID = library.BranchID
}).ToList();
All this does is take the results of GetLibraries(searchTerm), which returns a list of LibrarySearchResult objects, and maps them over to a list of LibrarySearchResultsViewModel's.
While this works well for small result sets, once I get up into the 1,000's, it really starts to drag, taking about 12 seconds before it finishes the conversion.
My question :
Since I'm using paging here, I really only need to display a fraction of the data that's being returned in a large result set. Is there a way to utilize something like Take() or GetRange(), so that the conversion only happens for the records I need to display? Say out of 1,000 records, I only want to get records 20 through 40, and convert them over to the view models.
I'm all for any suggestions on improving or refactoring this code as well.

Use Skip and Take:
// take records from 20 to 40
var records = librarySearchResults.Skip(20).Take(20);
You can easily paginate it (you'll need page and pageSize).
On the other hand you're using ToList there, consider using just IEnumerable, conversion to list can eat up lots of time, especially for large data set.

you can use Skip() and Take() together to enable paging.
var idx = // set this based on which page you're currently generating
librarySearchResults.Skip(idx * numitems).Take(numitems).Select(lib => ...);

Related

Faster way to get distinct values in LINQ?

I have a web part in SharePoint, and I am trying to populate a drop-down control with the unique/distinct values from a particular field in a list.
Unfortunately, due to the nature of the system, it is a text field, so there is no other definitive source to get the data values (i.e., if it were a choice field, I could get the field definition and just get the values from there), and I am using the chosen value of the drop-down in a subsequent CAML query, so the values must be accurate to what is present on the list items. Currently the list has arpprox. 4K items, but it is (and will continue) growing slowly.
And, it's part of a sandbox solution, so it is restricted by the user code service time limit - and it's timing out more often than not. In my dev environment I stepped through the code in debug, and it seems like the line of LINQ where I actually get the distinct values is the most time consuming, and I then commented out the call to this method entirely, and the timeouts stop, so I am fairly certain this is where the problem is.
Here's my code:
private void AddUniqueValues(SPList list, SPField filterField, DropDownList dropDownControl)
{
SPQuery query = new SPQuery();
query.ViewFields = string.Format("<FieldRef Name='{0}' />", filterField.InternalName);
query.ViewFieldsOnly = true;
SPListItemCollection results = list.GetItems(query); // retrieves ~4K items
List<string> uniqueValues = results.Cast<SPListItem>().Select(item => item[filterField.Id].ToString()).Distinct().ToList(); // this takes too long with 4K items
uniqueValues.Sort();
dropDownControl.Items.AddRange(uniqueValues.Select(itm => new ListItem(itm)).ToArray());
}
As far as I am aware, there's no way to get "distinct" values directly in a CAML query, so how can I do this more quickly? Is there a way to restructure the LINQ to run faster?
Is there an easy/fast way to do this from the client side? (REST would be preferred, but I'd do JSOM if necessary).
Thought I'd add some extra information here since I did some further testing and found some interesting results.
First, to address the questions of whether the Cast() and Select() are needed: yes, they are.
SPListItemCollection is IEnumerable but not IEnumerable<T>, so we need to cast just to be able to get to use LINQ at all.
Then after it's cast to IEnumerable<SPListItem>, SPListItem is a fairly complex object, and I am looking to find distinct values from just one property of that object. Using Distinct() directly on the IEnumerable<SPListItem> yields.. all of them. So I have to Select() just the single values I want to compare.
So yes, the Cast() and Select() are absolutely necessary.
As noted in the comments by M.kazem Akhgary, in my original line of code, calling ToString() every time (for 4K items) did add some time. But in testing some other variations:
// original
List<string> uniqueValues = results.Cast<SPListItem>().Select(item => item[filterField.Id].ToString()).Distinct().ToList();
// hash set alternative
HashSet<object> items = new HashSet<object>(results.Cast<SPListItem>().Select(itm => itm[filterField.Id]));
// don't call ToString(), just deal with base objects
List<object> obs = results.Cast<SPListItem>().Select(itm => itm[filterField.Id]).Distinct().ToList();
// alternate LINQ syntax from Pieter_Daems answer, seems to remove the Cast()
var things = (from SPListItem item in results select item[filterField.Id]).Distinct().ToList();
I found that all of those methods took multiple tens of seconds to complete. Strangely, the DataTable/DataView method from Pieter_Daems answer, to which I added a bit to extract the values I wanted:
DataTable dt = results2.GetDataTable();
DataView vw = new DataView(dt);
DataTable udt = vw.ToTable(true, filterField.InternalName);
List<string> rowValues = new List<string>();
foreach (DataRow row in udt.Rows)
{
rowValues.Add(row[filterField.InternalName].ToString());
}
rowValues.Sort();
took only 1-2 seconds!
In the end, I am going with Thriggle's answer, because it deals nicely with SharePoint's 5000 item list view threshold, which I will probably be dealing with some day, and it is only marginally slower (2-3 seconds) than the DataTable method. Still much, much faster than all the LINQ.
Interesting to note, though, that the fastest way to get distinct values from a particular field from a SPListItemCollection seems to be the DataTable/DataView conversion method.

You're potentially introducing a significant delay by retrieving all items first before checking for distinctness.
An alternative approach would be to perform multiple CAML queries against SharePoint; this would result in one query per unique value (plus one final query that returns no results).
Make sure your list has column indexing applied to the field whose values you want to enumerate.
In your initial CAML query, sort by the field you want to enumerate and impose a row limit of one item.
Get the value of the field from the item returned by that query and add it to your collection of unique values.
Query the list again, sorting by the field and imposing a row limit of 1, but this time add a filter condition such that it only retrieves items where the field value is greater than the field value you just detected.
Add the value of the field in the returned item to your collection of unique values.
Repeat steps 4 and 5 until the query returns an empty result set, at which point your collection of unique values should contain all current values of the field (assuming more haven't been added since you started).
Will this be any faster? That depends on your data, and how frequently duplicate values occur.
If you have 4000 items and only 5 unique values, you'll be able to gather those 5 values in only 6 lightweight CAML queries, returning a total of 5 items. This makes a lot more sense than querying for all 4000 items and enumerating through them one at a time to look for unique values.
On the other hand, if you have 4000 items and 3000 unique values, you're looking at querying the list 3001 times. This might well be slower than retrieving all the items in a single query and using post-processing to find the unique values.

var distinctItems = (from SPListItem item in items select item["EmployeeName"]).Distinct().ToArray();
Or convert your results to DataView and do something like:
SPList oList = SPContext.Current.Web.Lists["ListName"];
SPQuery query = new SPQuery();
query.Query = "<OrderBy><FieldRef Name='Name' /></OrderBy>";
DataTable dtcamltest = oList.GetItems(query).GetDataTable();
DataView dtview = new DataView(dtcamltest);
DataTable dtdistinct = dtview.ToTable(true, "Name");
Source: https://sharepoint.stackexchange.com/questions/77988/caml-query-on-sharepoint-list-without-duplicates

Duplicate maybe?
.Distinct is an O(n) call.
You can't get any faster than that.
This being said, maybe you want to check if you need the cast + select for getting uniques - I'd try a HashSet.

C# - Concatenate an in memory IList and IQueryable?

Suppose I have a List containing one string value. Suppose I also have an IQueryable that contains several strings from a database. I want to be able to concatenate these two containers into one list and then be able to call methods such as .Skip or .Take on the list. I want to be able to do this in such a way that when I combine the two containers I don't load all of the DB data into memory (only after I call .Skip and .Take). Basically, I want to do something like this (pseudocode):
IQueryable someQuery = myEntities.GetDBQuery(); // Gets "test2", "test3"
IList inMemoryList = new List();
inMemoryList.Add("test");
IList finalList = inMemoryList.Union(someQuery) // Can I do something like this without loading DB data into memory? finalList should contain all 3 strings.
// At this point it is fine to load the filtered query into memory.
foreach (string myString in finalList.Skip(100).Take(200))
{
// Do work...
}
How can I achieve this?

If I didn't misunderstand, you are trying to query the data, part of which comes from memory and others from database, like this:
//the following code will not compile, just for example
var dbQuery = BuildDbQuery();
var list = BuildListInMemory();
var myQuery = (dbQuery + list).OrderBy(aa).Skip(bb).Take(cc).Select(dd);
//and you don't want to load all records into memory by dbQuery
//because you only need some of them
The short answer is NO, you can't. Consider the .OrderBy method, all data have to be in a same "place", otherwise the code can't sort them. So the code loads all records in database by dbQuery into memory(now they are in a same place) and then sorts all of them including those in list. That probably causes a memory issue when dbQuery gives thousands of rows.
HOW TO RESOLVE
Pass the data in list into database (as parameters of dbQuery) so that the query happens in database. This is easy if your list has only a few items.
If list also has lots of records that will makes dbQuery too complex, you can try to query twice, one for dbQuery and one for list. For example, you have 10,000 users in database and 1,000 users in your memory list, and you want to get the top 10 youngest users. You don't need to load 10,000 users into memory and then find the youngest 10. Instead, you find 10 youngest (ResultA) in dbQuery and load into memory, and 10 youngest (ResultB) in memory list, and then compare between ResultA and ResultB.

I entirely agree with Danny's answer when he says you need to somehow find a way to include in memory user list into db so that you achieve what you want. As for the example which you sought in your comment, without knowing data structure of your User object, seems difficult. However assuming you would be able to connect the dots. Here is my suggested approach:
Create temporary table with identical structure that of your regular user table in your db and insert all your inmemory users into it
Write a query to Union temporary and regular table both identical in structure so that should be easy.
Return the result in your application and use it performing standard Linq operations
If you want exact code which you can use as it is then you will have to provide your User object structure - fields type etc in db to enable me to write the code.

You specify that your query and your list are both sequences of strings. someQuery can be performed completely on the database side (not in-memory)
Let's make your sequences less generic:
IQueryable<string> someQuery = ...
IList<string> myList = ...
You also specify that myList contains only one element.
string myOneAndOnlyString = myList.Single();
As your list is in-memory, this has to be performed in-memory. But because the list has only one element, this won't take any time.
The query that you request:
IQueryable<string> correctQuery = someQuery
.Where(item => item.Equals(myOneandOnlyString)
.Skip(skipCount)
.Take(takeCount)
Use your SQL server profiler to check the used SQL and see that the request is completely performed in one SQL statement.

Is there a wildcard for the .Take method in LINQ?

I am trying to create a method using LINQ that would take X ammount of products fron the DB, so I am using the .TAKE method for that.
The thing is, in situations I need to take all the products, so is there a wildcard I can give to .TAKE or some other method that would bring me all the products in the DB?
Also, what happens if I do a .TAKE (50) and there are only 10 products in the DB?
My code looks something like :
var ratingsToPick = context.RatingAndProducts
.ToList()
.OrderByDescending(c => c.WeightedRating)
.Take(pAmmount);

You could separate it to a separate call based on your flag:
IEnumerable<RatingAndProducts> ratingsToPick = context.RatingAndProducts
.OrderByDescending(c => c.WeightedRating);
if (!takeAll)
ratingsToPick = ratingsToPick.Take(pAmmount);
var results = ratingsToPick.ToList();
If you don't include the Take, then it will simply take everything.
Note that you may need to type your original query as IEnumerable<MyType> as OrderByDescending returns an IOrderedEnumerable and won't be reassignable from the Take call. (or you can simply work around this as appropriate based on your actual code)
Also, as #Rene147 pointed out, you should move your ToList to the end otherwise it will retrieve all items from the database every time and the OrderByDescending and Take are then actually operating on a List<> of objects in memory not performing it as a database query which I assume is unintended.
Regarding your second question if you perform a Take(50) but only 10 entries are available. That might depend on your database provider, but in my experience, they tend to be smart enough to not throw exceptions and will simply give you whatever number of items are available. (I would suggest you perform a quick test to make sure for your specific case)

Your current solution always takes all products from database. Because you are calling ToList(). After loading all products from database you are taking first N in memory. In order to conditionally load first N products, you need to build query
int? countToTake = 50;
var ratingsToPick = context.RatingAndProducts
.OrderByDescending(c => c.WeightedRating);
// conditionally take only first results
if (countToTake.HasValue)
ratingsToPick = ratingsToPick.Take(countToTake.Value);
var result = ratingsToPick.ToList(); // execute query

Limit Number of Results being returned in a List from Linq

I'm using Linq/EF4.1 to pull some results from a database and would like to limit the results to the (X) most recent results. Where X is a number set by the user.
Is there a way to do this?
I'm currently passing them back as a List if this will help with limiting the result set. While I can limit this by looping until I hit X I'd just assume not pass the extra data around.
Just in case it is relevant...
C# MVC3 project running from a SQL Server database.

Use the Take function
int numberOfrecords=10; // read from user
listOfItems.OrderByDescending(x => x.CreatedDate).Take(numberOfrecords)
Assuming listOfItems is List of your entity objects and CreatedDate is a field which has the date created value (used here to do the Order by descending to get recent items).
Take() Function returns a specified number of contiguous elements from the start of a
sequence.
http://msdn.microsoft.com/en-us/library/bb503062.aspx

results = results.OrderByDescending(x=>x.Date).Take(10);
The OrderByDescending(...) will sort items by your date/time property (or w/e logic you want to use to get most recent) and Take(...) will limit to first x items (first being most recent, thanks to the ordering).
Edit: To return some rows not starting at the first row, use Skip():
results = results.OrderByDescending(x=>x.Date).Skip(50).Take(10);

Use Take(), before converting to a List. This way EF can optimize the query it creates and only return the data you need.

Querying by near, sorting, then paging

I'm using the geospatial "near" search in MongoDB (using the C# driver) to return homes within 25 miles of a given lat/long. This returns the homes sorted by proximity to the lat/long and works great.
However, I want to add in sorting (on other fields such as home price) and paging and here is where I'm getting stuck. To work correctly, it would need to figure out which homes were within 25 miles of the lat/long, then sort those results (let's say based on price), and then take a "page" of 10 results.
Below is what I have so far, the issue with it is it takes a page of results (based on the proximity sort) and then sorts that page of 10 results by what I set in "SetSortOrder" rather than sorting the entire result near the lat/long, so each page of 10 results is sorted in itself.
var coordinates = find.GetCoordinates();
var near = Query.Near("Coordinates", coordinates.Latitude,
coordinates.Longitude,
find.GetRadiansAway(), false);
var query = Collection().Find(near);
query.Skip = find.GetSkip();
query.Limit = find.GetLimit();
query.SetSortOrder(new string[] { "Price" });
var results = query.ToArray();

It is right behavior, because $near by default return result sorted by distance. And sorting done internally in $near operator, so you can't change it.
db.places.find( { loc : { $near :
[50,50] } } )
The above query finds the closest
points to (50,50) and returns them
sorted by distance (there is no need
for an additional sort parameter)
So in you example Price it's second sort field that sort data within result sorted by distance.
Workgraund is load entire result of Query.Near and than sort it by whatever you want field on the client.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.