Walking through an SQLite Table - c#

I would like to implement or use functionality that allows stepping through a Table in SQLite.
If I have a Table Products that has 100k rows, I would like to retrive perhaps 10k rows at a time. Somthing similar to how a webpage would list data and have a < Previous .. Next > link to walk through the data.
Are there select statements that can make this simple? I see and have tried using the ROWID in conjunction with LIMIT which seems ok if not ordering the data.
// This seems works if not ordering.
SELECT * FROM Products WHERE ROWID BETWEEN x AND y;

Are you looking for offset and limit? SQLite supports these. You can then use order by, which SQLite also supports.
EDIT: To elaborate, you can do something like:
Select * from Products order by name limit 10000 offset 10000;
This fetches the second 10k page from the table, sorted by name. Watch out for performance issues when dealing with limit/offset and order by.

Related

Millions of rows in the database, only so much needed

Problem summary:
C# (MVC), entity framework 5.0 and Oracle.
I have a couple of million rows in a view which joins two tables.
I need to populate dropdownlists with filter-posibilities.
The options in these dropdownlists should reflect the actual contents
of the view for that column, distinct.
I want to update the dropdownlists whenever you select something, so
that the new options reflect the filtered content, preventing you
from choosing something that would give 0 results.
Its slow.
Question: whats the right way of getting these dropdownlists populated?
Now for more detail.
-- Goal of the page --
The user is presented with some dropownlists that filter the data in a grid below. The grid represents a view (see "Database") where the results are filtered.
Each dropdownlist represents a filter for a column of the view. Once something is selected, the rest of the page updates. The other dropdownlists now contain the posible values for their corresponding columns that complies to the filter that was just applied in the first dropdownlist.
Once the user has selected a couple of filters, he/she presses the search button and the grid below the dropdownlists updates.
-- Database --
I have a view that selects almost all columns from two tables, nothing fancy there. Like this:
SELECT tbl1.blabla, tbl2.blabla etc etc
FROM table1 tbl1, table2 tbl2
WHERE bsl.bvz_id = bvz.id AND bsl.einddatum IS NULL;
There is a total of 22 columns. 13 VARCHARS (mostly small, 1 - 20, one of em has a size of 2000!), 6 DATES and 3 NUMBERS (one of them size 38 and one of them 15,2).
There are a couple of indexes on the tables, among which the relevant ID's for the WHERE clause.
Important thing to know: I cannot change the database. Maybe set an index here and there, but nothing major.
-- Entity Framework --
I created a Database first EDMX in my solution and also mapped the view. There are also classes for both tables, but I need data from both of them, so I don't know if I need them. The problem by selecting things from either table would be that you can't apply half of the filtering, but maybe there are smart way's I didn't think of yet.
-- View --
My view is strongly bound to a viewModel. In there I have a IEnumerable for each dropdownlist. The getter for these gets its data from a single IEnumerable called NameOfViewObjects. Like this:
public string SelectedColumn1{ get; set; }
private IEnumerable<SelectListItem> column1Options;
public IEnumerable<SelectListItem> Column1Options
{
get
{
if (column1Options == null)
{
column1Options= NameOfViewObjects.Select(item => item.Column1).Distinct()
.Select(item => new SelectListItem
{
Value = item,
Text = item,
Selected = item.Equals(SelectedColumn1, StringComparison.InvariantCultureIgnoreCase)
});
}
return column1Options;
}
}
The two solutions I've tried are:
- 1 -
Selecting all columns in a linq query I need for the dropdownlists (the 2000 varchar is not one of them and there are only 2 date columns), do a distinct on them and put the results into a Hashset. Then I set NameOfViewObjects to point towards this hashset. I have to wait for about 2 minutes for that to complete, but after that, populating the dropdownlists is almost instant (maybe a second for each of them).
model.Beslissingen = new HashSet<NameOfViewObject>(dbBes.NameOfViewObject
.DistinctBy(item => new
{
item.VarcharColumn1,
item.DateColumn1,
item.DateColumn2,
item.VarcharColumn2,
item.VarcharColumn3,
item.VarcharColumn4,
item.VarcharColumn5,
item.VarcharColumn6,
item.VarcharColumn7,
item.VarcharColumn8
}
)
);
The big problem here is that the object NameOfViewObject is probably quite large, and even though using distinct here, resulting in less than 100.000 results, it still uses over 500mb of memory for it. This is unacceptable, because there will be a lot of users using this screen (a lot would be... 10 max, 5 average simultaniously).
- 2 -
The other solution is to use the same linq query and point NameOfViewObjects towards the IQueryable it produces. This means that every time the view wants to bind a dropdownlist to a IEnumerable, it will fire a query that will find the distinct values for that column in a table with millions of rows where most likely the column it's getting the values from is not indexed. This takes around 1 minute for each dropdownlist (I have 10), so that takes ages.
Don't forget: I need to update the dropdownlists every time one of them has it's selection changed.
-- Question --
So I'm probably going at this the wrong way, or maybe one of these solutions should be combined with indexing all of the columns I use, maybe I should use another way to store the data in memory, so it's only a little, but there must be someone out there who has done this before and figured out something smart. Can you please tell me what would be the best way to handle a situation like this?
Acceptable performance:
having to wait for a while (2 minutes) while the page loads, but
everything is fast after that.
having to wait for a couple of seconds every time a dropdownlist
changes
the page does not use more than 500mb of memory
Of course you should have indexes on all columns and combinations in WHERE clauses. No index means table scan and O(N) query times. Those cannot scale under any circumstance.
You do not need millions of entries in a drop down. You need to be smarter about filtering the database down to manageable numbers of entries.
I'd take a page from Google. Their type ahead helps narrow down the entire Internet graph into groups of 25 or 50 per page, with the most likely at the top. Maybe you could manage that, too.
Perhaps a better answer is something like a search engine. If you were a Java developer you might try Lucene/SOLR and indexing. I don't know what the .NET equivalent is.
First point you need to check is your DB, make sure you have to right indexes and entity relations in place,
next if you want to dynamical build your filter options then you need to run the query with the existing filters to obtain what the next filter can be. there are several ways to do this,
firstly you can query the data and extract the values from the return, this has a huge load time and wastes time returning data you don't want (unless you are live updating the results with the filter and dont have paging, in which case you might aswell just get all the data and use linqToObjects to filter)
a second option is to have a parallel queries for each filter that returns the possible filters, so filter A = all possible values of A from data, filter b = all possible values of B when filtered by A in the data, C = all possible values of C when filtered by A & B in the data, etc. this is better than the first but not by much
another option is the use aggregates to speed things up, ie you have a parallel query as above but instead of returning the data you return how many records are returned, aggregate functions are always quicker so this will cut your load time dramatically but you are still repeatedly querying a huge dataset to it wont be exactly nippy.
you can tweak this further using exist to just return a 0 or 1.
in this case you would look at a table with all possible filters and then remove the ones with no values from the parallel query
the next option will be the fastest by a mile is to cache the filters in the DB, with a separate table
then you can query that and say from Cache, where filter = ABC select D, the problem with this maintaining the cache, which you would have to do in the DB as part of the save functions, trigggers etc.
Another solution that can be added in addition to the previous suggestions is to use the /*+ result_cache */ hint, if your version of Oracle supports it (Oracle version 11g or later). If the output of the query is small enough for a drop-down list, then when a user enters criteria that matches the same criteria another user used, the results are returned in a few milliseconds instead of a few seconds or minutes. Result cache is wonderful for queries that return a small set of rows out of millions.
select /*+ result_cache */ item_desc from some_table where item_id ...
The result cache is automatically flushed when any insert/updates/deletes occur on the database tables.
I've done something 'kind of' similar in the past - if you can add a table to the database then I'd explore introducing a 'scratchpad' type table where results are temporarily stored as the user refines their search. Since multiple users could be working simultaneously the table would have to have an additional column for identifying the user.
I'd think you'd see some performance benefit since all processing is kept server-side and your app would simply be pulling data from this table. Since you're adding this table you would also have total control over it.
Essentially I'd imagine the program flow would go something like:
User selects some filters and clicks 'Search'.
Server populates scratchpad table with results from that search.
App populates results grid from scratchpad table.
User further refines search and clicks 'Search'.
Server removes/adds rows to scratchpad table as necessary.
App populates results grid from scratchpad table.
And so on.
Rather than having all the users results in one 'scratchpad' table you could possibly explore having temporary 'scratchpad' tables per user.

SQL query by time too slow

I have a database of 700,000 entries with a fast-text search table. Each row has a time of day associated with it. I need to page records 100 rows at a time efficiently. I am doing this by tracking the end of day.
It is taking much too long to execute (15 seconds)
Here's an example query:
SELECT *
FROM Objects o, FTSObjects f
WHERE f.rowid = o.AutoIncID AND
o.TimeStamp > '2012-07-11 14:24:16.582' AND
o.TimeStamp <= '2012-07-12 04:00:00.000' AND
o.Name='GPSHistory'
ORDER BY o.TimeStamp
LIMIT 100
The timestamp field is indexed.
I think this is because the Order By statement is sorting all the records returned, then doing a limit but I am not sure.
Suggestions?
The BEST way is to get a good DBA to look at the plan that is generated and make sure it's the most optimal plan (e.g. make sure there are no table scans in the plan, which can happen if the optimizer uses bad statistics)
Here's some things that may help:
Add an index on Objects.Name - possibly even a compound index on Name and TimeStamp.
Add an index on rowid in FTSObjects it it doesn't already exist
UPDATE STATISTICS on the Timestamp index periodically (ideally after large updates or daily if updates are continuous)
Rebuild your clustered index (if you have one). This would help if your clustered index is on a field that does not get sequential inserts (e.g. a char field where inserts are in random places)
Don't select * if you don't need to - that increases I/O time
You might also try casting the strings to DATETIME, although I think SQL does this implicitly versus casting the data to a string (which would not use the index on datetime)
SELECT *
FROM Objects o, FTSObjects f
WHERE f.rowid = o.AutoIncID AND
o.TimeStamp > CONVERT(DATETIME,'2012-07-11 14:24:16.582') AND
o.TimeStamp <= CONVERT(DATETIME,'2012-07-12 04:00:00.000') AND
o.Name='GPSHistory'
ORDER BY o.TimeStamp
LIMIT 100
Yes, the ORDER BY is processed before the LIMIT, but that's the correct functionality. Paging wouldn't actually work otherwise. But some ideas for optimization.
Don't SELECT * if it's not absolutely necessary. I feel like it's probably not because if you're paging results it's almost certainly not every field in both tables the user is looking at.
Create a covered index on AutoIncID, TimeStamp to keep it from reading the data page. Add Name to that index if it comes from Objects.
Create a covered index on rowid, Name, if Name comes from FTSObjects.
If the returned fields can be limited, consider adding those fields to the covered indexes if it's only a couple fields. You don't want the index to get too big because then it will affect write times.

What is the best optimization technique for a wildcard search through 100,000 records in sql table

I am working on an ASP.NET MVC application. This application is used by 200 users. These
users constantly (every 5 mins) search for an item from the list of 100,000 items (this list is going to increase every month by 1-2 %). This list of 100,000 items are stored in a SQL Server table.
The search is a wildcard search
eg:
Select itemCode, itemName, ItemDesc
from tblItems
Where itemName like '%SearchWord%'
The searching needs to really fast since the main business relies on searching and selecting the item.
I would like to know how to get the best performance. The search results have to come up instantaneously.
What I have tried -
I tried pre-loading the entire 100,000 records into memcache and then reading from the memcache. I was trying to avoid the calls to SQL Server for every search.
This takes a lot of time. Every time user searches for an item, we are retrieving 100,000 records from the memcache and then doing the search. This is taking almost 2-3 times more time than direct SQL searches.
I tried doing a direct search on the SQL Server table but limiting the results to only 50 records at a time (using top 50)
This seems to be Ok but still no-where near the performance we are seeking
I would like to hear the possible solutions and links to any articles/code.
Thanks in advance
Run SQL Profiler and do a tuning profile. This will make recommendations on indexes to execute against your database.
Also, a query such as the following would be worth a try.
SELECT *
FROM
(
SELECT ROW_NUMBER() OVER ( ORDER BY ColumnA) AS RowNumber, itemCode, itemName, ItemDesc
FROM tblItems
WHERE itemName LIKE '%FooBar%'
) AS RowResults
WHERE RowNumber >= 1 AND RowNumber < 50
ORDER BY RowNumber
EDIT: Updated query to reflect your real scenario.
How about having a search without the leading wildcard as your primary search....
Where itemName like 'SearchWord%'
and then have having a "More Results" button that loads
Where itemName like '%SearchWord%'
(alternatively exclude results from the first result set)
Where itemName not like 'SearchWord%' and itemName like '%SearchWord%'
A weird alternative which might work, as it depends on several assumptions etc. Sorry not fully explained but am using ipad so hard to type. (and yes, this solution has been used in high txn commericial systems)
This assumes
That your query is cpu constrained not IO
That itemName is not too long, such that it holds all letters and numbers
That searchword, in total, contains enough selective characters and isnt just highly common characters
Your selection predicates are constrained by a %like%
The basic idea is to expand your query to help the optimiser know which rows need the like scanning.
Step 1. Setup your table
Create an additional 26 or 36 columns for each letter/digit. When I've done this for real it has always been a seperate table, but putting it on source table should be ok for a small volume like 100k. Lets call the colmns trig_a, trig_b etc.
Create a trigger for each insert/edit/delete and put a 1 or 0 into the trig_a field if it contains an 'a', do this for all 26/36 columns. The trigger to do this is complex, but possible (at least using Oracle). If you get stuck I'm sure SO'ers can create it, or I can dig it out.
At this point, we have a series of columns that indicate whether a field contains a letter/digit etc.
Step 2. Helping you query
With this extra info, we are in the position to help the optimiser. Add the following to your query
Select ... Where .... And
((trig_a > 0) or (searchword not like '%a%')) and
((trig_b > 0) or (searchword not like '%b%')) and
... Repeat for all columns monitored...
If the optimiser behaves, it can use the (hopefully) lower cost field>0 predicates to reduce the like predicates evaluated.
Notes.
You may need to force the optimiser to scan trig_? Fields first
Indexes can help on trig_? Fields, especically if in the source table
I haven't shown how to handle upper/lower case, dont forget to handle this
You might find just doing a few letters is all you need to do.
This technique doesnt offer performance gains for every use of like, so it isnt a general purpose technique for everywhere you use a like.

Update Multiple Records of Two Fields

I have a table called driver and i want to update the drivers' position fields ('pos_x' and pos_y) with random numbers and what i did was once select the data from table (to see how many drivers do i have) then update their position then select the data again is there another way to do this thing?
If you create a class to hold the driver information, then you can eliminate the last step (selecting the data again).
The steps would be:
1) Read the data into a List.
2) Update the values in the List.
3) Write the data from the List to the database.
I want to update the drivers' position fields (pos_x and pos_y) with random numbers
You can do this quite easily just using SQL.
UPDATE Person
SET Pos_X = ABS(CHECKSUM(NEWID())) % 1000
, Pos_Y = ABS(CHECKSUM(NEWID())) % 1000
As this is all done on the SQL server it means you won't incur a network overhead shipping data back and forth. Of course you will need to select the result to work with afterwards.
Why ABS-CHECKSUM-NEWID? I tried it with T-SQL's RAND() function with less than satisfactory results!

Efficient Custom Paging in ASP.NET 2.0 while sorting

I've got an web app in ASP.NET 2.0 in which I need to do paging. My method of data access is to pull a DataSet out of a database call, then convert that to a List<Foo> (where Foo is my type I'm pulling out of the DB) and bind my GridView to it. My reason for this is that I didn't want to be having to use string indexers on DataTables all through my application, and that I could separate the display logic from the database by implementing display logic as properties on my classes. This also means I'm doing sorting in .NET instead of SQL.
To implement paging, then, I need to pull all Foo out of the database, sort the list, then take what I want out of the full list to display:
List<Foo> myFoo = MyDB.GetFoos();
myFoo.Sort(new Foo.FooComparer());
List<Foo> toDisplay = new List<Foo>();
for (int i = pageIndex * pageSize; i < (pageIndex + 1) * pageSize && i < myFoo.Count; i++)
{
toDisplay.Add(myFoo[i]);
}
//bind grid
With enough elements, this becomes a source of delay; on my development machine connecting to the test database, it takes almost 0.5 seconds to bind one grid on the screen when pulling 5000 records from the DB.
To solve this problem, am I going to have to move all my display logic to SQL so the sorting and paging can take place there, or is there a better way?
Also, does Linq to SQL solve this? If I'm sorting by a custom property implemented in my .NET class and then using .Skip(pageIndex * pageSize).Take(pageSize), will it convert that to SQL as noted in this question?
Yes - I would recommend you move your record selection to SQL (sorting and paging) - the classic way to perform paging in SQL is to use a CTE. I'll find you a good example and update my answer. There's a good example here http://softscenario.blogspot.com/2007/11/sql-2005-server-side-paging-using-cte.html - I googled for "sql paging cte".
Linq to SQL will use the Row_Number approach as described in the article and in general it will have performant paging queries to the database.
However there are limits to the amount of data that SQL can page for you and still be performant.
If you have a table with millions or rows in it, the paging functions need to limit the amount of data queried and subsequently paged with the Row_Number approach.
Let's just say you want to page this query :
Select column1, column2, column3 from table1 where column1 > 100
Now let's say that returns 1,000,000 rows. SQL Server still has to run its paging routine over a million rows. That will take a few seconds to page out the result set of the initial query. And it has to do this for every query.
To make sure the performance is maintained, you need to limit the amount of records returned that SQL will page.
Select TOP 10000 column1, column2, column3 from table1 where column1 > 100
Now, even though 1 million records match the query, only 10000 will be paged and this will speed things up to sub second response. In this scenario, the user should be notified that the query that they ran to the database was too broad and they need to narrow the search criteria becauase not all possible results were results.

Categories