Get first item matching condition in large SharePoint list - c#

So I have a SP list with about 100k items (voucher codes) in it.
Each of them has columns for State (Active/Used), Value (10,20,30) Group (Normal, Special) and Code (random alphanumeric). Each of the columns are indexed
I can't use CAML to get the next active code for a certain group and value, because each of the criteria would return > 5k items (list view threshold).
So what would be the most efficient way to retrieve the next code?
As the list is continuously growing, loading all items with SPListItemCollectionPosition is not really an option. Isn't there a better way?
It should work for onprem, as well as spOnline
Thank you

If your Code is a progressive number, increasing at any new entered item, you have 2 options:
Create a service List that you can name something like CodeCounter, here you store the last created Code in a column. Create a workflow in main list, starting at item creation. This workflow will read the last created Code from service list, then update it to Code+1 and in parallel manages the update (optional) of main code in main list
use an event handler (farm solution, here the query surely works better)

Related

Millions of rows in the database, only so much needed

Problem summary:
C# (MVC), entity framework 5.0 and Oracle.
I have a couple of million rows in a view which joins two tables.
I need to populate dropdownlists with filter-posibilities.
The options in these dropdownlists should reflect the actual contents
of the view for that column, distinct.
I want to update the dropdownlists whenever you select something, so
that the new options reflect the filtered content, preventing you
from choosing something that would give 0 results.
Its slow.
Question: whats the right way of getting these dropdownlists populated?
Now for more detail.
-- Goal of the page --
The user is presented with some dropownlists that filter the data in a grid below. The grid represents a view (see "Database") where the results are filtered.
Each dropdownlist represents a filter for a column of the view. Once something is selected, the rest of the page updates. The other dropdownlists now contain the posible values for their corresponding columns that complies to the filter that was just applied in the first dropdownlist.
Once the user has selected a couple of filters, he/she presses the search button and the grid below the dropdownlists updates.
-- Database --
I have a view that selects almost all columns from two tables, nothing fancy there. Like this:
SELECT tbl1.blabla, tbl2.blabla etc etc
FROM table1 tbl1, table2 tbl2
WHERE bsl.bvz_id = bvz.id AND bsl.einddatum IS NULL;
There is a total of 22 columns. 13 VARCHARS (mostly small, 1 - 20, one of em has a size of 2000!), 6 DATES and 3 NUMBERS (one of them size 38 and one of them 15,2).
There are a couple of indexes on the tables, among which the relevant ID's for the WHERE clause.
Important thing to know: I cannot change the database. Maybe set an index here and there, but nothing major.
-- Entity Framework --
I created a Database first EDMX in my solution and also mapped the view. There are also classes for both tables, but I need data from both of them, so I don't know if I need them. The problem by selecting things from either table would be that you can't apply half of the filtering, but maybe there are smart way's I didn't think of yet.
-- View --
My view is strongly bound to a viewModel. In there I have a IEnumerable for each dropdownlist. The getter for these gets its data from a single IEnumerable called NameOfViewObjects. Like this:
public string SelectedColumn1{ get; set; }
private IEnumerable<SelectListItem> column1Options;
public IEnumerable<SelectListItem> Column1Options
{
get
{
if (column1Options == null)
{
column1Options= NameOfViewObjects.Select(item => item.Column1).Distinct()
.Select(item => new SelectListItem
{
Value = item,
Text = item,
Selected = item.Equals(SelectedColumn1, StringComparison.InvariantCultureIgnoreCase)
});
}
return column1Options;
}
}
The two solutions I've tried are:
- 1 -
Selecting all columns in a linq query I need for the dropdownlists (the 2000 varchar is not one of them and there are only 2 date columns), do a distinct on them and put the results into a Hashset. Then I set NameOfViewObjects to point towards this hashset. I have to wait for about 2 minutes for that to complete, but after that, populating the dropdownlists is almost instant (maybe a second for each of them).
model.Beslissingen = new HashSet<NameOfViewObject>(dbBes.NameOfViewObject
.DistinctBy(item => new
{
item.VarcharColumn1,
item.DateColumn1,
item.DateColumn2,
item.VarcharColumn2,
item.VarcharColumn3,
item.VarcharColumn4,
item.VarcharColumn5,
item.VarcharColumn6,
item.VarcharColumn7,
item.VarcharColumn8
}
)
);
The big problem here is that the object NameOfViewObject is probably quite large, and even though using distinct here, resulting in less than 100.000 results, it still uses over 500mb of memory for it. This is unacceptable, because there will be a lot of users using this screen (a lot would be... 10 max, 5 average simultaniously).
- 2 -
The other solution is to use the same linq query and point NameOfViewObjects towards the IQueryable it produces. This means that every time the view wants to bind a dropdownlist to a IEnumerable, it will fire a query that will find the distinct values for that column in a table with millions of rows where most likely the column it's getting the values from is not indexed. This takes around 1 minute for each dropdownlist (I have 10), so that takes ages.
Don't forget: I need to update the dropdownlists every time one of them has it's selection changed.
-- Question --
So I'm probably going at this the wrong way, or maybe one of these solutions should be combined with indexing all of the columns I use, maybe I should use another way to store the data in memory, so it's only a little, but there must be someone out there who has done this before and figured out something smart. Can you please tell me what would be the best way to handle a situation like this?
Acceptable performance:
having to wait for a while (2 minutes) while the page loads, but
everything is fast after that.
having to wait for a couple of seconds every time a dropdownlist
changes
the page does not use more than 500mb of memory
Of course you should have indexes on all columns and combinations in WHERE clauses. No index means table scan and O(N) query times. Those cannot scale under any circumstance.
You do not need millions of entries in a drop down. You need to be smarter about filtering the database down to manageable numbers of entries.
I'd take a page from Google. Their type ahead helps narrow down the entire Internet graph into groups of 25 or 50 per page, with the most likely at the top. Maybe you could manage that, too.
Perhaps a better answer is something like a search engine. If you were a Java developer you might try Lucene/SOLR and indexing. I don't know what the .NET equivalent is.
First point you need to check is your DB, make sure you have to right indexes and entity relations in place,
next if you want to dynamical build your filter options then you need to run the query with the existing filters to obtain what the next filter can be. there are several ways to do this,
firstly you can query the data and extract the values from the return, this has a huge load time and wastes time returning data you don't want (unless you are live updating the results with the filter and dont have paging, in which case you might aswell just get all the data and use linqToObjects to filter)
a second option is to have a parallel queries for each filter that returns the possible filters, so filter A = all possible values of A from data, filter b = all possible values of B when filtered by A in the data, C = all possible values of C when filtered by A & B in the data, etc. this is better than the first but not by much
another option is the use aggregates to speed things up, ie you have a parallel query as above but instead of returning the data you return how many records are returned, aggregate functions are always quicker so this will cut your load time dramatically but you are still repeatedly querying a huge dataset to it wont be exactly nippy.
you can tweak this further using exist to just return a 0 or 1.
in this case you would look at a table with all possible filters and then remove the ones with no values from the parallel query
the next option will be the fastest by a mile is to cache the filters in the DB, with a separate table
then you can query that and say from Cache, where filter = ABC select D, the problem with this maintaining the cache, which you would have to do in the DB as part of the save functions, trigggers etc.
Another solution that can be added in addition to the previous suggestions is to use the /*+ result_cache */ hint, if your version of Oracle supports it (Oracle version 11g or later). If the output of the query is small enough for a drop-down list, then when a user enters criteria that matches the same criteria another user used, the results are returned in a few milliseconds instead of a few seconds or minutes. Result cache is wonderful for queries that return a small set of rows out of millions.
select /*+ result_cache */ item_desc from some_table where item_id ...
The result cache is automatically flushed when any insert/updates/deletes occur on the database tables.
I've done something 'kind of' similar in the past - if you can add a table to the database then I'd explore introducing a 'scratchpad' type table where results are temporarily stored as the user refines their search. Since multiple users could be working simultaneously the table would have to have an additional column for identifying the user.
I'd think you'd see some performance benefit since all processing is kept server-side and your app would simply be pulling data from this table. Since you're adding this table you would also have total control over it.
Essentially I'd imagine the program flow would go something like:
User selects some filters and clicks 'Search'.
Server populates scratchpad table with results from that search.
App populates results grid from scratchpad table.
User further refines search and clicks 'Search'.
Server removes/adds rows to scratchpad table as necessary.
App populates results grid from scratchpad table.
And so on.
Rather than having all the users results in one 'scratchpad' table you could possibly explore having temporary 'scratchpad' tables per user.

How to update a single item in a Sitecore index?

In my Sitecore content tree there are few thousands of items, and I just want to alter few items programmatically. Instead of rebuilding the entire lucene index which is taking a big time, I want to update index entries for each item I'm altering in real time. I tried
item.Database.Indexes.UpdateItem(item);
but it is obsolete and ask me to use SearchManager.
Can anyone guide me how to update index entries for a given item?
PS: I'm altering items from desktop application, not the website.
Try to execute one of the HistoryEngine.RegisterItem... methods, e.g:
item.Database.Engines.HistoryEngine.RegisterItemSaved(item, new ItemChanges(item));
item.Database.Engines.HistoryEngine.RegisterItemCreated(item);
item.Database.Engines.HistoryEngine.RegisterItemMoved(item, oldParentId);
Well actually there is no Update operation on indexes, so feel free to do delete/add

Article/news ordering by order id

I m having a little trouble coming up with a schema to order and changing order in a article/news management system.
Here goes:
I have News Object Model as follow:
class News {
int id;
string Title;
string Content;
string OrderId;
// trimmed
}
I have CRUD for the object model. and List as follows:
Id Title Order
1. Foo -+
2. Bar -+
3. Glah -+
What i want to do is when user clicks on - for first news, i want to replace 1 and 2 orderid and of course display as well.
well how do i do this on server side? lets say for 1. item order id is 1 , how do i find the first item that has a higher order id then this one?
Or take 2. Bar, when i click on - , how do i find/replace order ids with first one. or i click on + how do i replace order id of this news with 3. Glah ?
is there a better way of doing this?
There are also some UI where user drags and drops ? any pointers on that?
When the user changes the location of an item, the server needs to know which item was changed and what it's new position is. In the code below, I'm using a List to figure out the new positons. In this sample code, newPosition is the new zero-based position and selectedArticle is the article that was moved.
List<Article> articles = LoadSortedArticles()
articles.Remove(selectedArticle);
articles.Insert(newPosition, selectedArticle);
UpdateArticles(articles);
After running, the article's index within the list tells you its new position. That applies to all articles in the list and works the same for a drag/drop UI. I don't know entity framework, but if you can map the orderId field in the database to the index of the article in the list, then you should be good to go.
I hope this is at least able to give you some ideas. Maybe someone else can give a solution specific to entity framework, but I think putting the articles into a sorted list and letting the list do the work might be the easiest way.
I think the best approach here would be to implement a Swap method that takes two News objects and swaps their OrderId, which I am assuming is numeric.
So the idea would be that you would pass this method the object that was clicked on, as well as either the one above (if the user clicked +) or the one below (if they clicked -). If you do it this way then you avoid the task of trying to find the next or previous item.
In the case of drag and drop, the task is a little different. You would first need to retrieve all the items between (and including) the item being dragged and the drop target, ordered by OrderId. From there you swap the dragged item with the next or previous item (depending on direction), and continue doing so until you have swapped it with the drop target.

Check if a number is missing from a non-sorted league

I have a datagridview displayed on the screen. The user can change the end column called "Pareto" to any integer he/she likes. But there is a catch, all records are in a league.
The user also may wish to simply leave a number out and come back to it, so not allowing for a missing number isn't possible.
Basically I need a method that checks if a column has a missing number from a non-sorted row, and also stores each number that's missing into an array/list for me to output.
Normally I like these little logic questions But after the loops I've coded today my brain is burnt out... so any input would be great!
I originally thought about first getting the maximum records in the grid via count(no problem)
Then using it as the range from 1 - "maxCount". Find all missing numbers. It sounded so simple until I thought about the fact it is not sorted.
Many thanks
Not clear the real work-flow, but..
The fact is that DataGridView is not sorted is a pure UI stuff. Your data can be sorted and binded to the DataGrid via DataView or any other intermidiate View layer.
One time user inserts the value, you can execute binary-search (fastest available on sorted collection search) to find the value of interest.
EDIT
If you need to check the missed numbers among set of numbers from (1..500), first you can have all numbers sorted, like defined in my post in data-model.
List<int> sortedAlreadyAvailableNumbers ...
after make
var missingNumbers = Enumerable.Range(1, 500).Except(sortedAlreadyAvailableNumbers );
Hope this helps.

Sharepoint Coding standards

I just came accross this table:
Please let me know what difference in poor-->better for the last 5 items.
The reason for all of this is quite simple. When you write SPList.Items.Count to get the total number of items, SPList.Items returns the collection of all items in the list.
You don't want the all items, this can be an expensive action.
By writing SPList.ItemCount, you make sure you only read a number from the database, and not all items.
Essentially, this is true for all items in the list - you should generally avoid using the entire Collection objects (i.e. SPList.Items or SPFolder.Files) when you can. Similarly, if you use them more than once, you should cache them using a local variable.
Here's an example using indexers. Suppose I have a Guid, and want to get an item.
SPListItem item = list.Items[guid];
Looks innocent enough, but it is actually the same as:
SPListItemCollection items = list.Items;
SPListItem item = items[guid];
The point is - SharePoint (and C#, really) doesn't know what you're going to do next, or how you're going to use the collection. The moment you've wrote .Items you already made a slow operation.
he reason for all of this is quite simple. When you write SPList.Items.Count to get the total number of items, SPList.Items returns the collection of all items in the list.
You don't want the all items, this can be an expensive action.
By writing SPList.ItemCount, you make sure you only read a number from the database, and not all items.
Essentially, this is true for all items in the list - you should generally avoid using the entire Collection objects (i.e. SPList.Items or SPFolder.Files) when you can. Similarly, if you use them more than once, you should cache them using a local variable.
Here's an example using indexers. Suppose I have a Guid, and want to get an item.
SPListItem item = list.Items[guid];
Looks innocent enough, but it is actually the same as:
SPListItemCollection items = list.Items;
SPListItem item = items[guid];
The point is - SharePoint (and C#, really) doesn't know what you're going to do next, or how you're going to use the collection. The moment you've wrote .Items you already made a slow operation.

Categories