SQL query by time too slow - c#

I have a database of 700,000 entries with a fast-text search table. Each row has a time of day associated with it. I need to page records 100 rows at a time efficiently. I am doing this by tracking the end of day.
It is taking much too long to execute (15 seconds)
Here's an example query:
SELECT *
FROM Objects o, FTSObjects f
WHERE f.rowid = o.AutoIncID AND
o.TimeStamp > '2012-07-11 14:24:16.582' AND
o.TimeStamp <= '2012-07-12 04:00:00.000' AND
o.Name='GPSHistory'
ORDER BY o.TimeStamp
LIMIT 100
The timestamp field is indexed.
I think this is because the Order By statement is sorting all the records returned, then doing a limit but I am not sure.
Suggestions?

The BEST way is to get a good DBA to look at the plan that is generated and make sure it's the most optimal plan (e.g. make sure there are no table scans in the plan, which can happen if the optimizer uses bad statistics)
Here's some things that may help:
Add an index on Objects.Name - possibly even a compound index on Name and TimeStamp.
Add an index on rowid in FTSObjects it it doesn't already exist
UPDATE STATISTICS on the Timestamp index periodically (ideally after large updates or daily if updates are continuous)
Rebuild your clustered index (if you have one). This would help if your clustered index is on a field that does not get sequential inserts (e.g. a char field where inserts are in random places)
Don't select * if you don't need to - that increases I/O time
You might also try casting the strings to DATETIME, although I think SQL does this implicitly versus casting the data to a string (which would not use the index on datetime)
SELECT *
FROM Objects o, FTSObjects f
WHERE f.rowid = o.AutoIncID AND
o.TimeStamp > CONVERT(DATETIME,'2012-07-11 14:24:16.582') AND
o.TimeStamp <= CONVERT(DATETIME,'2012-07-12 04:00:00.000') AND
o.Name='GPSHistory'
ORDER BY o.TimeStamp
LIMIT 100

Yes, the ORDER BY is processed before the LIMIT, but that's the correct functionality. Paging wouldn't actually work otherwise. But some ideas for optimization.
Don't SELECT * if it's not absolutely necessary. I feel like it's probably not because if you're paging results it's almost certainly not every field in both tables the user is looking at.
Create a covered index on AutoIncID, TimeStamp to keep it from reading the data page. Add Name to that index if it comes from Objects.
Create a covered index on rowid, Name, if Name comes from FTSObjects.
If the returned fields can be limited, consider adding those fields to the covered indexes if it's only a couple fields. You don't want the index to get too big because then it will affect write times.

Related

Manage live stream viewer huge log

I've developped an .NET CORE application for Live Stream, witch has a lot of funcionalities. One of those, is to show to our clients how many people was watching in every 5 minute interval.
By now, im saving on a SQL Server database, a log for each viewer with ViewerID and TimeStamp in a 5 minutes interval. It seem's to be a bad approach, since in first couple days, i've reached 100k rows in that table. I need that data, because we have a "Time Peek Chart", that shows how many people and who was watching in a 5 minutes interval.
Anyways, do anyone have a suggestion of how can i handle this? I was thinking about a .txt file with the same data, but it also seems that I/O of the server can be a problem...
Also o though about a NoSQL database, maybe use a existing MongoDB AaS, like scalegrid.io or mlab.com.
Can someone help me with this, please? Thanks in advance!
I presume this is related to one of your previous questions Filter SQL GROUP by a filter that is not in GROUP and an expansion of the question in comments 'how to make this better'.
This answer below is definitely not the only way to do this - but I think it's a good start.
As you're using SQL Server for the initial data storage (minute-by-minute) I would suggest continuing to use SQL Server for the next stage of data storage. I think you'd need a compelling argument to use something else for the next stage, as you then need to maintain both of them (e.g., keeping software up-to-date, backups, etc), as well as having all the fun of transferring data properly between the two pieces of software.
My suggested approach is to keep the most detailed/granular data that you need, but no more.
In the previous question, you were keeping data by the minute, then calculating up to the 5-minute bracket. In this answer I'd summarise (and store) the data for the 5-minute brackets then discard your minute-by-minute data once it has been summarised.
For example, you could have a table called 'StreamViewerHistory' that has the Viewer's ID and a timestamp (much like the original table).
This only has 1 row per viewer per 5 minute interval. You could make the timestamp field a smalldatetime (as you don't care about seconds) or even have it as an ID value pointing to another table that references each timeframe. I think smalldatetime is easier to start with.
Depending exactly on how it's used, I would suggest having the Primary Key (or at least the Clustered index) being the timestamp before the ViewerID - this means new rows get added to the end. It also assumes that most queries of data are filtered by timeframes first (e.g., last week's worth of data).
I would consider having an index on ViewerId then the timestamp, for when people want to view an individual's history.
e.g.,
CREATE TABLE [dbo].[StreamViewerHistory](
[TrackDate] smalldatetime NOT NULL,
[StreamViewerID] int NOT NULL,
CONSTRAINT [PK_StreamViewerHistory] PRIMARY KEY CLUSTERED
(
[TrackDate] ASC,
[StreamViewerID] ASC
)
GO
CREATE NONCLUSTERED INDEX [IX_StreamViewerHistory_StreamViewerID] ON [dbo].[StreamViewerHistory]
(
[StreamViewerID] ASC,
[TrackDate] ASC
)
GO
Now, on some sort of interval (either as part of your ping process, or a separate process run regularly) interrogate the data in your source table LiveStreamViewerTracks, crunch the data as per the previous question, and save the results in this new table. Then delete the rows from LiveStreamViewerTracks to keep it smaller and usable. Ensure you delete the relevant rows only though (e.g., the ones that have been processed).
The advantage of the above process is that the data in this new table is very usable by SQL Server. Whenever you need a graph (e.g., of the last 14 days) it doesn't need to read the whole table - instead it just starts at the relevant day and only read the relevant rows. Note to make sure your queries are SARGable though e.g.,
-- This is SARGable and can use the index
SELECT TrackDate, StreamViewerID
FROM StreamViewerHistory
WHERE TrackDate >= '20201001'
-- These are non-SARGable and will read the whole table
SELECT TrackDate, StreamViewerID
FROM StreamViewerHistory
WHERE CAST(TrackDate as date) >= '20201001'
SELECT TrackDate, StreamViewerID
FROM StreamViewerHistory
WHERE DATEDIFF(day, TrackDate, '20201001') <= 0
Typically, if you want counts of users for every 5 minutes within a given timeframe, you'd have something like
SELECT TrackDate, COUNT(*) AS NumViewers
FROM StreamViewerHistory
WHERE TrackDate >= '20201001 00:00:00' AND TrackDate < '20201002 00:00:00'
GROUP BY TrackDate
This should be good enough for quite a while. If your views/etc do slow down a lot, you could consider other things to help e.g., you could also do further calculations/other reporting tables e.g., also have a table with TrackDate and NumViewers - where there's one row per TrackDate. This should be very fast when reporting overall number of users, but will not allow you to drill down to a specific user.

What is the best optimization technique for a wildcard search through 100,000 records in sql table

I am working on an ASP.NET MVC application. This application is used by 200 users. These
users constantly (every 5 mins) search for an item from the list of 100,000 items (this list is going to increase every month by 1-2 %). This list of 100,000 items are stored in a SQL Server table.
The search is a wildcard search
eg:
Select itemCode, itemName, ItemDesc
from tblItems
Where itemName like '%SearchWord%'
The searching needs to really fast since the main business relies on searching and selecting the item.
I would like to know how to get the best performance. The search results have to come up instantaneously.
What I have tried -
I tried pre-loading the entire 100,000 records into memcache and then reading from the memcache. I was trying to avoid the calls to SQL Server for every search.
This takes a lot of time. Every time user searches for an item, we are retrieving 100,000 records from the memcache and then doing the search. This is taking almost 2-3 times more time than direct SQL searches.
I tried doing a direct search on the SQL Server table but limiting the results to only 50 records at a time (using top 50)
This seems to be Ok but still no-where near the performance we are seeking
I would like to hear the possible solutions and links to any articles/code.
Thanks in advance
Run SQL Profiler and do a tuning profile. This will make recommendations on indexes to execute against your database.
Also, a query such as the following would be worth a try.
SELECT *
FROM
(
SELECT ROW_NUMBER() OVER ( ORDER BY ColumnA) AS RowNumber, itemCode, itemName, ItemDesc
FROM tblItems
WHERE itemName LIKE '%FooBar%'
) AS RowResults
WHERE RowNumber >= 1 AND RowNumber < 50
ORDER BY RowNumber
EDIT: Updated query to reflect your real scenario.
How about having a search without the leading wildcard as your primary search....
Where itemName like 'SearchWord%'
and then have having a "More Results" button that loads
Where itemName like '%SearchWord%'
(alternatively exclude results from the first result set)
Where itemName not like 'SearchWord%' and itemName like '%SearchWord%'
A weird alternative which might work, as it depends on several assumptions etc. Sorry not fully explained but am using ipad so hard to type. (and yes, this solution has been used in high txn commericial systems)
This assumes
That your query is cpu constrained not IO
That itemName is not too long, such that it holds all letters and numbers
That searchword, in total, contains enough selective characters and isnt just highly common characters
Your selection predicates are constrained by a %like%
The basic idea is to expand your query to help the optimiser know which rows need the like scanning.
Step 1. Setup your table
Create an additional 26 or 36 columns for each letter/digit. When I've done this for real it has always been a seperate table, but putting it on source table should be ok for a small volume like 100k. Lets call the colmns trig_a, trig_b etc.
Create a trigger for each insert/edit/delete and put a 1 or 0 into the trig_a field if it contains an 'a', do this for all 26/36 columns. The trigger to do this is complex, but possible (at least using Oracle). If you get stuck I'm sure SO'ers can create it, or I can dig it out.
At this point, we have a series of columns that indicate whether a field contains a letter/digit etc.
Step 2. Helping you query
With this extra info, we are in the position to help the optimiser. Add the following to your query
Select ... Where .... And
((trig_a > 0) or (searchword not like '%a%')) and
((trig_b > 0) or (searchword not like '%b%')) and
... Repeat for all columns monitored...
If the optimiser behaves, it can use the (hopefully) lower cost field>0 predicates to reduce the like predicates evaluated.
Notes.
You may need to force the optimiser to scan trig_? Fields first
Indexes can help on trig_? Fields, especically if in the source table
I haven't shown how to handle upper/lower case, dont forget to handle this
You might find just doing a few letters is all you need to do.
This technique doesnt offer performance gains for every use of like, so it isnt a general purpose technique for everywhere you use a like.

C# SQL Server - More Efficient for Multiple Database accesses or multiple loops through data?

In part of my application I have to get the last ID of a table where a condition is met
For example:
SELECT(MAX) ID FROM TABLE WHERE Num = 2
So I can either grab the whole table and loop through it looking for Num = 2, or I can grab the data from the table where Num = 2. In the latter, I know the last item will be the MAX ID.
Either way, I have to do this around 50 times...so would it be more efficient grabbing all the data and looping through the list of data looking for a specific condition...
Or would it be better to grab the data several times based on the condition..where I know the last item in the list will be the max id
I have 6 conditions I will have to base the queries on
Im just wondering which is more efficient...looping through a list of around 3500 items several times, or hitting the database several times where I can already have the data broken down like I need it
I could speak for SqlServer. If you create a StoredProcedure where Num is a parameter that you pass, you will get the best performance due to its optimization engine on execution plan of the stored procedure. Of course an Index on that field is mandatory.
Let the database do this work, it's what it is designed to do.
Does this table have a high insert frequency? Does it have a high update frequency, specifically on the column that you're applying the MAX function to? If the answer is no, you might consider adding an IS_MAX BIT column and set it using an insert trigger. That way, the row you want is essentially cached, and it's trivial to look up.

Walking through an SQLite Table

I would like to implement or use functionality that allows stepping through a Table in SQLite.
If I have a Table Products that has 100k rows, I would like to retrive perhaps 10k rows at a time. Somthing similar to how a webpage would list data and have a < Previous .. Next > link to walk through the data.
Are there select statements that can make this simple? I see and have tried using the ROWID in conjunction with LIMIT which seems ok if not ordering the data.
// This seems works if not ordering.
SELECT * FROM Products WHERE ROWID BETWEEN x AND y;
Are you looking for offset and limit? SQLite supports these. You can then use order by, which SQLite also supports.
EDIT: To elaborate, you can do something like:
Select * from Products order by name limit 10000 offset 10000;
This fetches the second 10k page from the table, sorted by name. Watch out for performance issues when dealing with limit/offset and order by.

Best way to save a ordered List to the Database while keeping the ordering

I was wondering if anyone has a good solution to a problem I've encountered numerous times during the last years.
I have a shopping cart and my customer explicitly requests that it's order is significant. So I need to persist the order to the DB.
The obvious way would be to simply insert some OrderField where I would assign the number 0 to N and sort it that way.
But doing so would make reordering harder and I somehow feel that this solution is kinda fragile and will come back at me some day.
(I use C# 3,5 with NHibernate and SQL Server 2005)
Thank you
Ok here is my solution to make programming this easier for anyone that happens along to this thread. the trick is being able to update all the order indexes above or below an insert / deletion in one update.
Using a numeric (integer) column in your table, supported by the SQL queries
CREATE TABLE myitems (Myitem TEXT, id INTEGER PRIMARY KEY, orderindex NUMERIC);
To delete the item at orderindex 6:
DELETE FROM myitems WHERE orderindex=6;
UPDATE myitems SET orderindex = (orderindex - 1) WHERE orderindex > 6;
To swap two items (4 and 7):
UPDATE myitems SET orderindex = 0 WHERE orderindex = 4;
UPDATE myitems SET orderindex = 4 WHERE orderindex = 7;
UPDATE myitems SET orderindex = 7 WHERE orderindex = 0;
i.e. 0 is not used, so use a it as a dummy to avoid having an ambiguous item.
To insert at 3:
UPDATE myitems SET orderindex = (orderindex + 1) WHERE orderindex > 2;
INSERT INTO myitems (Myitem,orderindex) values ("MytxtitemHere",3)
Best solution is a Doubly Linked list. O(1) for all operations except indexing. Nothing can index SQL quickly though except a where clause on the item you want.
0,10,20 types fail. Sequence column ones fail. Float sequence column fails at group moves.
Doubly Linked list is same operations for addition, removal, group deletion, group addition, group move. Single linked list works ok too. Double linked is better with SQL in my opinion though. Single linked list requires you to have the entire list.
FWIW, I think the way you suggest (i.e. committing the order to the database) is not a bad solution to your problem. I also think it's probably the safest/most reliable way.
How about using a linked list implementation? Having one column the will hold the value (order number) of the next item. I think it's by far the easiest to use when doing insertion of orders in between. No need to renumber.
Unfortunately there is no magic bullet for this. You cannot guarentee the order of any SELECT statement WITHOUT an order by clause. You need to add the column and program around it.
I don't know that I'd recommend adding gaps in the order sequence, depending on the size of your lists and the hits on the site, you might gain very little for the over head of handling the logic (you'd still need to cater for the occasion where all the gaps have been used up). I'd take a close look to see what benifits this would give you in your situation.
Sorry I can't offer anything better, Hope this helped.
I wouldn't recommend the A, AA, B, BA, BB approach at all. There's a lot of extra processing involved to determine hierarchy and inserting entries in between is not fun at all.
Just add an OrderField, integer. Don't use gaps, because then you have to either work with a non-standard 'step' on your next middle insert, or you will have to resynchronize your list first, then add a new entry.
Having 0...N is easy to reorder, and if you can use Array methods or List methods outside of SQL to re-order the collection as a whole, then update each entry, or you can figure out where you are inserting into, and +1 or -1 each entry after or before it accordingly.
Once you have a little library written for it, it'll be a piece of cake.
I would just insert an order field. Its the simplest way. If the customer can reorder the fields or you need to insert in the middle then just rewrite the order fields for all items in that batch.
If down the line you find this limiting due to poor performance on inserts and updates then it is possible to use a varchar field rather than an integer. This allows for quite a high level of precision when inserting. eg to insert between items 'A' and 'B' you can insert an item ordered as 'AA'. This is almost certainly overkill for a shopping cart though.
On a level of abstraction above the cart Items let's say CartOrder (that has 1-n with CartItem) you can maintain a field called itemOrder which could be just a comma - separated list of id(PK) of cartItem records relevant . It will be at application layer that you require to parse that and arrange your item models accordingly . The big plus for this approach will be in case of order reshufflings , there might not be changes on individual objects but since order is persisted as an index field inside the order item table rows you will have to issue an update command for each one of the rows updating their index field.
Please let me know your criticisms on this approach, i am curious to know in which ways this might fail.
I solved it pragmatically like this:
The order is defined in the UI.
The backend gets a POST request that contains the IDs and the corresponding Position of every item in the list.
I start a transaction and update the position for every ID.
Done.
So ordering is expensive but reading the ordered list is super cheap.
I would recommend keeping gaps in the order number, so instead of 1,2,3 etc, use 10,20,30... If you need to just insert one more item, you could put it at 15, rather than reordering everything at that point.
Well, I would say the short answer is:
Create a primary key of autoidentity in the cartcontents table, then insert rows in the correct top-down order. Then by selecting from the table with order by the primary key autoidentity column would give you the same list. By doing this you have to delete all items and reinsert then in case of alterations to the cart contents. (But that is still quite a clean way of doing it) If that's not feasible, then go with the order column like suggested by others.
When I use Hibernate, and need to save the order of a #OneToMany, I use a Map and not a List.
#OneToMany(fetch = FetchType.EAGER, mappedBy = "rule", cascade = CascadeType.ALL)
#MapKey(name = "position")
#OrderBy("position")
private Map<Integer, RuleAction> actions = LazyMap.decorate(new LinkedHashMap<>(), FactoryUtils.instantiateFactory(RuleAction.class, new Class[] { Rule.class }, new Object[] { this }));
In this Java example, position is an Integer property of RuleAction so the order is persisted that way. I guess in C# this would look rather similar.

Categories